Incremental builds

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Incremental builds

Joachim Nilsson
Hi, I ran into a tricky part of the code that I would like some hints with.
Background: I am investigating how to make it possible to make incremental builds after a rename or move of the rootDir (normal for incremental builds in CI servers running many executors on different slaves).

The first part was  fairly easy to solve, in a not so pretty way, but with minimal changes.
Now I can read the task history even after a move of directories.

My solution for the first part was to use the incoming task in CacheBackedTaskHistoryRepository to get the project root dir and then strip that part from the path in the binary file just before serializing. Changing the internal representation to non absolute will have way too big impact for me to grasp so I figured small change (a new boolan before the path) in the binary format was better.

            public void write(Encoder encoder, LazyTaskExecution execution) throws Exception {
                ...
                encoder.writeInt(execution.getOutputFiles().size());
                for (String outputFile : execution.getOutputFiles()) {
                    if (canBeConvertedFromAbsolutePath(outputFile)){
                        encoder.writeBoolean(true);
                        encoder.writeString(convertFromAbsolutePath(outputFile));
                    }else{
                        encoder.writeBoolean(false);
                        encoder.writeString(outputFile);
                    }
                }
                ...
            }

However, it was not done yet. Now I face that the CacheBackedFileSnapshotRepository is also involved in setting the UP-TO-DATE flag to false since it compares output files directly between runs. And this time the whole FileCollectionSnapshot is stored so my minimal change attempt will fail.

Any suggestions for solving the CacheBackedFileSnapshotRepository?
Here, the whole object is serialized and in the serializer, there is no access to Task, Project or Gradle. Replacing the object with relative file references would probably fail the serialization since it uses File.exists for each file.

Regards,
Joachim
Reply | Threaded
Open this post in threaded view
|

Re: Incremental builds

Adam Murdoch

On 21 Feb 2014, at 6:02 pm, Joachim Nilsson <[hidden email]> wrote:

Hi, I ran into a tricky part of the code that I would like some hints with.
Background: I am investigating how to make it possible to make incremental builds after a rename or move of the rootDir (normal for incremental builds in CI servers running many executors on different slaves).

The first part was  fairly easy to solve, in a not so pretty way, but with minimal changes.
Now I can read the task history even after a move of directories.

My solution for the first part was to use the incoming task in CacheBackedTaskHistoryRepository to get the project root dir and then strip that part from the path in the binary file just before serializing. Changing the internal representation to non absolute will have way too big impact for me to grasp so I figured small change (a new boolan before the path) in the binary format was better.

            public void write(Encoder encoder, LazyTaskExecution execution) throws Exception {
                ...
                encoder.writeInt(execution.getOutputFiles().size());
                for (String outputFile : execution.getOutputFiles()) {
                    if (canBeConvertedFromAbsolutePath(outputFile)){
                        encoder.writeBoolean(true);
                        encoder.writeString(convertFromAbsolutePath(outputFile));
                    }else{
                        encoder.writeBoolean(false);
                        encoder.writeString(outputFile);
                    }
                }
                ...
            }

However, it was not done yet. Now I face that the CacheBackedFileSnapshotRepository is also involved in setting the UP-TO-DATE flag to false since it compares output files directly between runs. And this time the whole FileCollectionSnapshot is stored so my minimal change attempt will fail.

Any suggestions for solving the CacheBackedFileSnapshotRepository?
Here, the whole object is serialized and in the serializer, there is no access to Task, Project or Gradle. Replacing the object with relative file references would probably fail the serialization since it uses File.exists for each file.

I would use the cache base directory as the base directory, in both CacheBackedFileSnapshotRepository and CacheBackedFileSnapshotRepository. It can be queried from TaskArtifactStateCacheAccess, but you’ll have to add a getter on it to expose the cache base dir.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com



Reply | Threaded
Open this post in threaded view
|

Re: Incremental builds

Joachim Nilsson
Thanks Adam.

A follow up question:
CONTRIBUTION.md states 'Avoid using features introduced in Java 1.6 or later'.
Is this a hard requirement or could I use java.nio.Path from java7?
That would really simplify the use of relative paths, which I will need if I use cache dir as base.

I have one concern, that is if output or input resides in directories outside of the root project.
In that case, forcing the use of relative paths may destroy portability as well, if the destination directory is not on 'same path depth' as the origin.
With my first simple approach to use rootDir as detector of when a path could be rewritten, paths and files outside the project would remain absolute when serialized.
This may be a minor concern and can probably be handled by documentation. Just something to think about.

Regards,
Joachim



On Wed, Feb 26, 2014 at 4:46 AM, Adam Murdoch <[hidden email]> wrote:

On 21 Feb 2014, at 6:02 pm, Joachim Nilsson <[hidden email]> wrote:

Hi, I ran into a tricky part of the code that I would like some hints with.
Background: I am investigating how to make it possible to make incremental builds after a rename or move of the rootDir (normal for incremental builds in CI servers running many executors on different slaves).

The first part was  fairly easy to solve, in a not so pretty way, but with minimal changes.
Now I can read the task history even after a move of directories.

My solution for the first part was to use the incoming task in CacheBackedTaskHistoryRepository to get the project root dir and then strip that part from the path in the binary file just before serializing. Changing the internal representation to non absolute will have way too big impact for me to grasp so I figured small change (a new boolan before the path) in the binary format was better.

            public void write(Encoder encoder, LazyTaskExecution execution) throws Exception {
                ...
                encoder.writeInt(execution.getOutputFiles().size());
                for (String outputFile : execution.getOutputFiles()) {
                    if (canBeConvertedFromAbsolutePath(outputFile)){
                        encoder.writeBoolean(true);
                        encoder.writeString(convertFromAbsolutePath(outputFile));
                    }else{
                        encoder.writeBoolean(false);
                        encoder.writeString(outputFile);
                    }
                }
                ...
            }

However, it was not done yet. Now I face that the CacheBackedFileSnapshotRepository is also involved in setting the UP-TO-DATE flag to false since it compares output files directly between runs. And this time the whole FileCollectionSnapshot is stored so my minimal change attempt will fail.

Any suggestions for solving the CacheBackedFileSnapshotRepository?
Here, the whole object is serialized and in the serializer, there is no access to Task, Project or Gradle. Replacing the object with relative file references would probably fail the serialization since it uses File.exists for each file.

I would use the cache base directory as the base directory, in both CacheBackedFileSnapshotRepository and CacheBackedFileSnapshotRepository. It can be queried from TaskArtifactStateCacheAccess, but you’ll have to add a getter on it to expose the cache base dir.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com




Reply | Threaded
Open this post in threaded view
|

Re: Incremental builds

Luke Daley-2
Our baseline is Java 1.5, so use of Path is out.

On Wed, Feb 26, 2014 at 5:17 AM, Joachim Nilsson <[hidden email]> wrote:

Thanks Adam.

A follow up question:
CONTRIBUTION.md states 'Avoid using features introduced in Java 1.6 or later'.
Is this a hard requirement or could I use java.nio.Path from java7?
That would really simplify the use of relative paths, which I will need if I use cache dir as base.

I have one concern, that is if output or input resides in directories outside of the root project.
In that case, forcing the use of relative paths may destroy portability as well, if the destination directory is not on 'same path depth' as the origin.
With my first simple approach to use rootDir as detector of when a path could be rewritten, paths and files outside the project would remain absolute when serialized.
This may be a minor concern and can probably be handled by documentation. Just something to think about.

Regards,
Joachim



On Wed, Feb 26, 2014 at 4:46 AM, Adam Murdoch <[hidden email]> wrote:

On 21 Feb 2014, at 6:02 pm, Joachim Nilsson <[hidden email]> wrote:

Hi, I ran into a tricky part of the code that I would like some hints with.
Background: I am investigating how to make it possible to make incremental builds after a rename or move of the rootDir (normal for incremental builds in CI servers running many executors on different slaves).

The first part was  fairly easy to solve, in a not so pretty way, but with minimal changes.
Now I can read the task history even after a move of directories.

My solution for the first part was to use the incoming task in CacheBackedTaskHistoryRepository to get the project root dir and then strip that part from the path in the binary file just before serializing. Changing the internal representation to non absolute will have way too big impact for me to grasp so I figured small change (a new boolan before the path) in the binary format was better.

            public void write(Encoder encoder, LazyTaskExecution execution) throws Exception {
                ...
                encoder.writeInt(execution.getOutputFiles().size());
                for (String outputFile : execution.getOutputFiles()) {
                    if (canBeConvertedFromAbsolutePath(outputFile)){
                        encoder.writeBoolean(true);
                        encoder.writeString(convertFromAbsolutePath(outputFile));
                    }else{
                        encoder.writeBoolean(false);
                        encoder.writeString(outputFile);
                    }
                }
                ...
            }

However, it was not done yet. Now I face that the CacheBackedFileSnapshotRepository is also involved in setting the UP-TO-DATE flag to false since it compares output files directly between runs. And this time the whole FileCollectionSnapshot is stored so my minimal change attempt will fail.

Any suggestions for solving the CacheBackedFileSnapshotRepository?
Here, the whole object is serialized and in the serializer, there is no access to Task, Project or Gradle. Replacing the object with relative file references would probably fail the serialization since it uses File.exists for each file.

I would use the cache base directory as the base directory, in both CacheBackedFileSnapshotRepository and CacheBackedFileSnapshotRepository. It can be queried from TaskArtifactStateCacheAccess, but you’ll have to add a getter on it to expose the cache base dir.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com





Reply | Threaded
Open this post in threaded view
|

Re: Incremental builds

Adam Murdoch
In reply to this post by Joachim Nilsson

On 27 Feb 2014, at 12:17 am, Joachim Nilsson <[hidden email]> wrote:

Thanks Adam.

A follow up question:
CONTRIBUTION.md states 'Avoid using features introduced in Java 1.6 or later'.
Is this a hard requirement or could I use java.nio.Path from java7?
That would really simplify the use of relative paths, which I will need if I use cache dir as base.

I have one concern, that is if output or input resides in directories outside of the root project.
In that case, forcing the use of relative paths may destroy portability as well, if the destination directory is not on 'same path depth' as the origin.
With my first simple approach to use rootDir as detector of when a path could be rewritten, paths and files outside the project would remain absolute when serialised.

You can (and should) still use the same approach. You can always change TaskArtifactStateCacheAccess to provide the root directory instead of the cache directory.

This may be a minor concern and can probably be handled by documentation. Just something to think about.

The plan is at some point to switch from using paths, and instead use input and output content as the key incremental state. Then, when a task executes, we can check if any task of the same type has run with the same input content anywhere on the machine, and what output content it produced. We can then compare the content of the actual outputs to see if the task is out of date.

Then, the incremental build will be able to deal with many more kinds of changes that it currently does not - things like moving the build directory, moving the source directories, renaming tasks, and so on.

Of course, it’s not that simple, but this should give you some idea of the plan.



Regards,
Joachim



On Wed, Feb 26, 2014 at 4:46 AM, Adam Murdoch <[hidden email]> wrote:

On 21 Feb 2014, at 6:02 pm, Joachim Nilsson <[hidden email]> wrote:

Hi, I ran into a tricky part of the code that I would like some hints with.
Background: I am investigating how to make it possible to make incremental builds after a rename or move of the rootDir (normal for incremental builds in CI servers running many executors on different slaves).

The first part was  fairly easy to solve, in a not so pretty way, but with minimal changes.
Now I can read the task history even after a move of directories.

My solution for the first part was to use the incoming task in CacheBackedTaskHistoryRepository to get the project root dir and then strip that part from the path in the binary file just before serializing. Changing the internal representation to non absolute will have way too big impact for me to grasp so I figured small change (a new boolan before the path) in the binary format was better.

            public void write(Encoder encoder, LazyTaskExecution execution) throws Exception {
                ...
                encoder.writeInt(execution.getOutputFiles().size());
                for (String outputFile : execution.getOutputFiles()) {
                    if (canBeConvertedFromAbsolutePath(outputFile)){
                        encoder.writeBoolean(true);
                        encoder.writeString(convertFromAbsolutePath(outputFile));
                    }else{
                        encoder.writeBoolean(false);
                        encoder.writeString(outputFile);
                    }
                }
                ...
            }

However, it was not done yet. Now I face that the CacheBackedFileSnapshotRepository is also involved in setting the UP-TO-DATE flag to false since it compares output files directly between runs. And this time the whole FileCollectionSnapshot is stored so my minimal change attempt will fail.

Any suggestions for solving the CacheBackedFileSnapshotRepository?
Here, the whole object is serialized and in the serializer, there is no access to Task, Project or Gradle. Replacing the object with relative file references would probably fail the serialization since it uses File.exists for each file.

I would use the cache base directory as the base directory, in both CacheBackedFileSnapshotRepository and CacheBackedFileSnapshotRepository. It can be queried from TaskArtifactStateCacheAccess, but you’ll have to add a getter on it to expose the cache base dir.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com






--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com



Reply | Threaded
Open this post in threaded view
|

Re: Incremental builds

Daz DeBoer-2
On Wed, Feb 26, 2014 at 2:05 PM, Adam Murdoch <[hidden email]> wrote:

On 27 Feb 2014, at 12:17 am, Joachim Nilsson <[hidden email]> wrote:

Thanks Adam.

A follow up question:
CONTRIBUTION.md states 'Avoid using features introduced in Java 1.6 or later'.
Is this a hard requirement or could I use java.nio.Path from java7?
That would really simplify the use of relative paths, which I will need if I use cache dir as base.

I have one concern, that is if output or input resides in directories outside of the root project.
In that case, forcing the use of relative paths may destroy portability as well, if the destination directory is not on 'same path depth' as the origin.
With my first simple approach to use rootDir as detector of when a path could be rewritten, paths and files outside the project would remain absolute when serialised.

You can (and should) still use the same approach. You can always change TaskArtifactStateCacheAccess to provide the root directory instead of the cache directory.

This may be a minor concern and can probably be handled by documentation. Just something to think about.

The plan is at some point to switch from using paths, and instead use input and output content as the key incremental state. Then, when a task executes, we can check if any task of the same type has run with the same input content anywhere on the machine, and what output content it produced. We can then compare the content of the actual outputs to see if the task is out of date.

Once we're using the task inputs as the key, it might be easier to move to a distributed cache of build outputs that would mean the we could re-use artifacts previously built by another developer or by CI. That would be a big build-avoidance step forward.


Then, the incremental build will be able to deal with many more kinds of changes that it currently does not - things like moving the build directory, moving the source directories, renaming tasks, and so on.

Of course, it’s not that simple, but this should give you some idea of the plan.



Regards,
Joachim



On Wed, Feb 26, 2014 at 4:46 AM, Adam Murdoch <[hidden email]> wrote:

On 21 Feb 2014, at 6:02 pm, Joachim Nilsson <[hidden email]> wrote:

Hi, I ran into a tricky part of the code that I would like some hints with.
Background: I am investigating how to make it possible to make incremental builds after a rename or move of the rootDir (normal for incremental builds in CI servers running many executors on different slaves).

The first part was  fairly easy to solve, in a not so pretty way, but with minimal changes.
Now I can read the task history even after a move of directories.

My solution for the first part was to use the incoming task in CacheBackedTaskHistoryRepository to get the project root dir and then strip that part from the path in the binary file just before serializing. Changing the internal representation to non absolute will have way too big impact for me to grasp so I figured small change (a new boolan before the path) in the binary format was better.

            public void write(Encoder encoder, LazyTaskExecution execution) throws Exception {
                ...
                encoder.writeInt(execution.getOutputFiles().size());
                for (String outputFile : execution.getOutputFiles()) {
                    if (canBeConvertedFromAbsolutePath(outputFile)){
                        encoder.writeBoolean(true);
                        encoder.writeString(convertFromAbsolutePath(outputFile));
                    }else{
                        encoder.writeBoolean(false);
                        encoder.writeString(outputFile);
                    }
                }
                ...
            }

However, it was not done yet. Now I face that the CacheBackedFileSnapshotRepository is also involved in setting the UP-TO-DATE flag to false since it compares output files directly between runs. And this time the whole FileCollectionSnapshot is stored so my minimal change attempt will fail.

Any suggestions for solving the CacheBackedFileSnapshotRepository?
Here, the whole object is serialized and in the serializer, there is no access to Task, Project or Gradle. Replacing the object with relative file references would probably fail the serialization since it uses File.exists for each file.

I would use the cache base directory as the base directory, in both CacheBackedFileSnapshotRepository and CacheBackedFileSnapshotRepository. It can be queried from TaskArtifactStateCacheAccess, but you’ll have to add a getter on it to expose the cache base dir.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com






--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com




Reply | Threaded
Open this post in threaded view
|

Re: Incremental builds

Adam Murdoch

On 27 Feb 2014, at 8:42 am, Daz DeBoer <[hidden email]> wrote:

On Wed, Feb 26, 2014 at 2:05 PM, Adam Murdoch <[hidden email]> wrote:

On 27 Feb 2014, at 12:17 am, Joachim Nilsson <[hidden email]> wrote:

Thanks Adam.

A follow up question:
CONTRIBUTION.md states 'Avoid using features introduced in Java 1.6 or later'.
Is this a hard requirement or could I use java.nio.Path from java7?
That would really simplify the use of relative paths, which I will need if I use cache dir as base.

I have one concern, that is if output or input resides in directories outside of the root project.
In that case, forcing the use of relative paths may destroy portability as well, if the destination directory is not on 'same path depth' as the origin.
With my first simple approach to use rootDir as detector of when a path could be rewritten, paths and files outside the project would remain absolute when serialised.

You can (and should) still use the same approach. You can always change TaskArtifactStateCacheAccess to provide the root directory instead of the cache directory.

This may be a minor concern and can probably be handled by documentation. Just something to think about.

The plan is at some point to switch from using paths, and instead use input and output content as the key incremental state. Then, when a task executes, we can check if any task of the same type has run with the same input content anywhere on the machine, and what output content it produced. We can then compare the content of the actual outputs to see if the task is out of date.

Once we're using the task inputs as the key, it might be easier to move to a distributed cache of build outputs that would mean the we could re-use artifacts previously built by another developer or by CI. That would be a big build-avoidance step forward.

That’s exactly the idea.

Once we’re more tolerant to things moving around, we can start reusing the outputs from one task to produce the outputs for some other task with the same input content, by simply copying the outputs into place.

For example, when compiling c++ sources for multiple variants with the same source files and same compiler settings, I can compile once and then reuse the resulting object files. Similarly when generating the resources for variants of an Android app. Or when I rename a jar without changing the inputs (eg the version changes), I can just rename the jar.

Then, if we move this cache so that it’s shared by all builds on the machine, we can reuse outputs between builds, whether I’ve moved the build or have multiple check outs. Which is nice for people working on branches of the same thing, or for CI machines.

And then it’s a (conceptually) simple step to share this cache across machines.

Of course, it’s not quite that simple, but that’s the rough plan.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com