Discussion:
parallelise not overlapping tasks
Romain Manni-Bucau
2018-01-19 13:20:50 UTC
Permalink
Hi guys,

there is no way to parallelize not conflicting tasks for a same phase at
the moment right? Any way it gets under the radar?

A common example is to run all code analyzis concurrently (findbugs, pmd,
checkstyle, ...) at the same time without waiting for one then the other
etc since all can be very long.

It could be nice an fancy to define part of the reactor parallelisablity in
the pom, like:

<plan>
<parallel>
<phase>process-sources</phase>
<executions>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
</executions>
</parallel>
</plan>

anything to enhance it?

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>
Robert Scholte
2018-01-19 14:08:22 UTC
Permalink
Hi Romain,

I don't think this is something for the pom.
It is the plugin which should know if it has effect on the build.
Or maybe more global: if we could define for the plugin what is input and
what is output, then it should be possible to decide which parts could run
in parallel.

thanks,
Robert

On Fri, 19 Jan 2018 14:20:50 +0100, Romain Manni-Bucau
Post by Romain Manni-Bucau
Hi guys,
there is no way to parallelize not conflicting tasks for a same phase at
the moment right? Any way it gets under the radar?
A common example is to run all code analyzis concurrently (findbugs, pmd,
checkstyle, ...) at the same time without waiting for one then the other
etc since all can be very long.
It could be nice an fancy to define part of the reactor parallelisablity in
<plan>
<parallel>
<phase>process-sources</phase>
<executions>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
</executions>
</parallel>
</plan>
anything to enhance it?
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github
<https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@maven.apache.org
For additional commands, e-mail: dev-***@maven.apache.org
Romain Manni-Bucau
2018-01-19 14:55:43 UTC
Permalink
Hi Robert,

this is the gradle way but it only works because you also have the
dependency chain between plugins in gradle. This is missing in maven (like
group:***@execution before group':artifact'@execution' / or after) and
therefore can't be worked out at all with just inputs outputs for backward
compatibility - and if you assume all plugins will migrate, this feature
would be available for end users in a very long time which would be sad.
That is why I thought letting the user helping providing some insights.

Anyway, being able to parallelise the code analyzis, front and back builds
for modules having frontend-maven-plugin + standard java in the same module
etc... would be very beneficial if we can have something working with maven.





Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>
Post by Robert Scholte
Hi Romain,
I don't think this is something for the pom.
It is the plugin which should know if it has effect on the build.
Or maybe more global: if we could define for the plugin what is input and
what is output, then it should be possible to decide which parts could run
in parallel.
thanks,
Robert
On Fri, 19 Jan 2018 14:20:50 +0100, Romain Manni-Bucau <
Hi guys,
Post by Romain Manni-Bucau
there is no way to parallelize not conflicting tasks for a same phase at
the moment right? Any way it gets under the radar?
A common example is to run all code analyzis concurrently (findbugs, pmd,
checkstyle, ...) at the same time without waiting for one then the other
etc since all can be very long.
It could be nice an fancy to define part of the reactor parallelisablity in
<plan>
<parallel>
<phase>process-sources</phase>
<executions>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
</executions>
</parallel>
</plan>
anything to enhance it?
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <
https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>
---------------------------------------------------------------------
Tibor Digana
2018-01-22 00:04:28 UTC
Permalink
I am facing very slow Maven reporting/site.
I have optimized JavaDoc and avoided the javadoc plugin downloading
sources, Java API which has improved the performance but still Site takes 8
minutes at least to complete.
It would be worth to mark some reporting plugins running in parallel in
aggregator POM.
Post by Romain Manni-Bucau
Hi guys,
there is no way to parallelize not conflicting tasks for a same phase at
the moment right? Any way it gets under the radar?
A common example is to run all code analyzis concurrently (findbugs, pmd,
checkstyle, ...) at the same time without waiting for one then the other
etc since all can be very long.
It could be nice an fancy to define part of the reactor parallelisablity in
<plan>
<parallel>
<phase>process-sources</phase>
<executions>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
</executions>
</parallel>
</plan>
anything to enhance it?
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/
rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>
Romain Manni-Bucau
2018-12-06 13:29:33 UTC
Permalink
Hey guys,

any way we move that topic forward beginning of next year?

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>
Post by Tibor Digana
I am facing very slow Maven reporting/site.
I have optimized JavaDoc and avoided the javadoc plugin downloading
sources, Java API which has improved the performance but still Site takes 8
minutes at least to complete.
It would be worth to mark some reporting plugins running in parallel in
aggregator POM.
Post by Romain Manni-Bucau
Hi guys,
there is no way to parallelize not conflicting tasks for a same phase at
the moment right? Any way it gets under the radar?
A common example is to run all code analyzis concurrently (findbugs, pmd,
checkstyle, ...) at the same time without waiting for one then the other
etc since all can be very long.
It could be nice an fancy to define part of the reactor parallelisablity
in
Post by Romain Manni-Bucau
<plan>
<parallel>
<phase>process-sources</phase>
<executions>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
</executions>
</parallel>
</plan>
anything to enhance it?
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/
rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>
Mickael Istria
2018-12-06 13:44:00 UTC
Permalink
Hi,
Post by Romain Manni-Bucau
any way we move that topic forward beginning of next year?
I guess providing patches would be the best way ;)

I'm coming late in this discussion and I'm a newcomer on that list, so I
could miss context. This could relate to an effort we are doing in Eclipse
IDE (and Eclipse m2e) to run module builds in parallel. One question I have
is how do you know 2 tasks aren't conflicting? We didn't figure out a safe
way to know that in m2e, maybe I missed something?

Cheers,
Romain Manni-Bucau
2018-12-06 13:47:04 UTC
Permalink
Currently maven can't but I expect a way to do it, either in the next xsd
as originally proposed or, why not, with a naming convention in the id of
the execution (<execution>my-exec#after#other-exec</execution> or something
like that if we want it before maven 4)

The nice thing is that once done it makes phases pretty much useless (it is
just about making implicit these dependencies) and it makes the whole build
parallelizable and not just modules which will often find some bottleneck
modules in projects building a distribution.

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>
Post by Mickael Istria
Hi,
Post by Romain Manni-Bucau
any way we move that topic forward beginning of next year?
I guess providing patches would be the best way ;)
I'm coming late in this discussion and I'm a newcomer on that list, so I
could miss context. This could relate to an effort we are doing in Eclipse
IDE (and Eclipse m2e) to run module builds in parallel. One question I have
is how do you know 2 tasks aren't conflicting? We didn't figure out a safe
way to know that in m2e, maybe I missed something?
Cheers,
Mickael Istria
2018-12-06 14:00:07 UTC
Permalink
I think there is a difference between scheduling tasks (one is after the
other) and assuming that 2 tasks that are ready can run in parallel.
Adding scheduling data would be helpful and get rid of the concept of
phases, for sure; but we also -and more importantly- need the existing
mojos to be audited and be able to declare whether they're thread-safe
before running them in parallel, and this sounds like a gigantic work.
Romain Manni-Bucau
2018-12-06 14:34:50 UTC
Permalink
Mojos already have the ability to say if they are threadsafe, what do you
see missing?

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>
Post by Mickael Istria
I think there is a difference between scheduling tasks (one is after the
other) and assuming that 2 tasks that are ready can run in parallel.
Adding scheduling data would be helpful and get rid of the concept of
phases, for sure; but we also -and more importantly- need the existing
mojos to be audited and be able to declare whether they're thread-safe
before running them in parallel, and this sounds like a gigantic work.
Mickael Istria
2018-12-06 14:37:34 UTC
Permalink
Ok, my bad, thanks for the hint.
Enrico Olivelli
2018-12-07 20:22:43 UTC
Permalink
What about having parallel 'planes' of execution?
Stuff like checktyle, rat, validation plugin may run in their own plane of
execution.
By default every existing plugin will be on a 'default' plane.
When the build starts we start a thread/fork a process for each plane used
by plugins declared in the pom.
Maybe we could define standard planes so that plugins will be able to
choose from a well know list of ids.

This approach is very naive, because it does deal with a real graph, but it
can be an easy step compared to a global refactoring/ introduction of
input/output declaration for each existing plugin.

This can be orthogonal to phases, each plane will execute every of the
phases.

Maybe I have a limited view of the Maven core model.
Hope that helps

Enrico
Post by Romain Manni-Bucau
Mojos already have the ability to say if they are threadsafe, what do you
see missing?
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <
https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<
https://www.packtpub.com/application-development/java-ee-8-high-performance
Post by Mickael Istria
I think there is a difference between scheduling tasks (one is after the
other) and assuming that 2 tasks that are ready can run in parallel.
Adding scheduling data would be helpful and get rid of the concept of
phases, for sure; but we also -and more importantly- need the existing
mojos to be audited and be able to declare whether they're thread-safe
before running them in parallel, and this sounds like a gigantic work.
--
-- Enrico Olivelli
Tibor Digana
2018-12-07 23:50:27 UTC
Permalink
In my projects, the most plugins use single execution.
External projects also have this kind of principle.
Thus we should have a look in those possibilities where the most plugins
can gain the performance.
Usually the compiler and tests take long.
I know that maven-compiler-plugin:3.8.1 will be incremental which is good
of course but we should perhaps continue with gaining the build performance.
If somebody has an idea on how to develop a compiler which partially
compiles a module depending on SCM changes, feel free to bring it to our
mailing list. The same with tests where the set of tests is changed
depending on SCM changes.

Cheers
Tibor
Post by Romain Manni-Bucau
Hi guys,
there is no way to parallelize not conflicting tasks for a same phase at
the moment right? Any way it gets under the radar?
A common example is to run all code analyzis concurrently (findbugs, pmd,
checkstyle, ...) at the same time without waiting for one then the other
etc since all can be very long.
It could be nice an fancy to define part of the reactor parallelisablity in
<plan>
<parallel>
<phase>process-sources</phase>
<executions>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
</executions>
</parallel>
</plan>
anything to enhance it?
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <
https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>
Romain Manni-Bucau
2018-12-08 12:14:26 UTC
Permalink
Using the scm is not enough or it is for single module projects.

You have to have a graph of dependencies (inputs/outputs) and save each
task state in target to have incremental support

But please note incremental build is != parallel build at mojo level.

This last one is easy to do and a quick win IMHO
Post by Tibor Digana
In my projects, the most plugins use single execution.
External projects also have this kind of principle.
Thus we should have a look in those possibilities where the most plugins
can gain the performance.
Usually the compiler and tests take long.
I know that maven-compiler-plugin:3.8.1 will be incremental which is good
of course but we should perhaps continue with gaining the build performance.
If somebody has an idea on how to develop a compiler which partially
compiles a module depending on SCM changes, feel free to bring it to our
mailing list. The same with tests where the set of tests is changed
depending on SCM changes.
Cheers
Tibor
Post by Romain Manni-Bucau
Hi guys,
there is no way to parallelize not conflicting tasks for a same phase at
the moment right? Any way it gets under the radar?
A common example is to run all code analyzis concurrently (findbugs, pmd,
checkstyle, ...) at the same time without waiting for one then the other
etc since all can be very long.
It could be nice an fancy to define part of the reactor parallelisablity
in
Post by Romain Manni-Bucau
<plan>
<parallel>
<phase>process-sources</phase>
<executions>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
<execution>
<groupId>...</groupId>
<artifactId>...</artifactId>
<version>...</version>
<id>...</id>
<execution>
</executions>
</parallel>
</plan>
anything to enhance it?
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> | Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <
https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>
Loading...