Implementing incremental tasks

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Implementing incremental tasks

Xavier Ducrohet
Currently plugins don't have access to the list of changed files (input and output) that triggered Gradle to re-run a task.

The Android plugin is adding several steps to the compilation, and we'd like to make sure that we can do them as incremental steps. We could replicate what Gradle already computes but this seems silly as Gradle already has the data.

Any chance this will become part of the official API?

We are also looking at some tasks that would require storing some information somewhere to improve speed when running the task incrementally. I see there are some things stored inside .gradle/<version>/taskArtifacts and I was wondering if plugin can get access to this (or just simply write directly to it).

thanks
Reply | Threaded
Open this post in threaded view
|

Re: Implementing incremental tasks

Adam Murdoch

On 05/12/2012, at 2:23 PM, Xavier Ducrohet wrote:

Currently plugins don't have access to the list of changed files (input and output) that triggered Gradle to re-run a task.

The Android plugin is adding several steps to the compilation, and we'd like to make sure that we can do them as incremental steps. We could replicate what Gradle already computes but this seems silly as Gradle already has the data.

Any chance this will become part of the official API?

Absolutely. We want to provide something there. We would also use this in Gradle to improve the compile tasks (to fix GRADLE-1501, for example) and make the copy task incremental, and (probably) to fix some long standing bugs with the copy and archive tasks.

There's no concrete plan just yet for how that API would look.

What information would you need from such an API?


We are also looking at some tasks that would require storing some information somewhere to improve speed when running the task incrementally. I see there are some things stored inside .gradle/<version>/taskArtifacts and I was wondering if plugin can get access to this (or just simply write directly to it).

Not yet. The above API might provide something, where you can attach some state to the task, or perhaps to each output file, and have Gradle persist it for you.

What sort of information do you want to store?


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com

Reply | Threaded
Open this post in threaded view
|

Re: Implementing incremental tasks

Xavier Ducrohet
I think we would need:
- list of input file changed, with the kind of change: added/removed/modified
- list of output file changed which here should only be removed I guess. I mean someone could go an hand-edit some output files manually, but this is the same. Any touched output file in anyway should will trigger a new build.

Regarding tasks storage we have several cases:

- we have tasks that takes all the files from a folder and convert every file into a new file, 1 to 1. Here no storage is needed.

- we have tasks that go through files in a folder and compile them all manually one by one. But they may have dependencies as well as more than one output. So we generate a file during the compilation that gives the list of inputs and outputs needed for this particular compilation.
The task will compile many such file though we have to mix this data with the input/output changes given to the task to figure out if each file actually need recompilation.
Right now we store this file in build/ which is not a great place for it (but ok). It would be better to store this somewhere else, but this is probably fine for now since we are limited to writing a file in a folder.

- We also have a task that is going to take the content of a whole folder, and process it to generate some output. Here the task will build a model from those files, storing which file contributed to which part of the model, before generating the output. We want to be able to store this model in a faster way to write/read than the original file, so that upon running the task again we can load this, and apply changes based on input file changes.

thanks
Xavier


On Tue, Dec 4, 2012 at 8:01 PM, Adam Murdoch <[hidden email]> wrote:

On 05/12/2012, at 2:23 PM, Xavier Ducrohet wrote:

Currently plugins don't have access to the list of changed files (input and output) that triggered Gradle to re-run a task.

The Android plugin is adding several steps to the compilation, and we'd like to make sure that we can do them as incremental steps. We could replicate what Gradle already computes but this seems silly as Gradle already has the data.

Any chance this will become part of the official API?

Absolutely. We want to provide something there. We would also use this in Gradle to improve the compile tasks (to fix GRADLE-1501, for example) and make the copy task incremental, and (probably) to fix some long standing bugs with the copy and archive tasks.

There's no concrete plan just yet for how that API would look.

What information would you need from such an API?


We are also looking at some tasks that would require storing some information somewhere to improve speed when running the task incrementally. I see there are some things stored inside .gradle/<version>/taskArtifacts and I was wondering if plugin can get access to this (or just simply write directly to it).

Not yet. The above API might provide something, where you can attach some state to the task, or perhaps to each output file, and have Gradle persist it for you.

What sort of information do you want to store?


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com


Reply | Threaded
Open this post in threaded view
|

Re: Implementing incremental tasks

Adam Murdoch

On 06/12/2012, at 8:08 AM, Xavier Ducrohet wrote:

I think we would need:
- list of input file changed, with the kind of change: added/removed/modified
- list of output file changed which here should only be removed I guess. I mean someone could go an hand-edit some output files manually, but this is the same. Any touched output file in anyway should will trigger a new build.

I'd guess you also need some indication of which task settings have changed.


Regarding tasks storage we have several cases:

- we have tasks that takes all the files from a folder and convert every file into a new file, 1 to 1. Here no storage is needed.

- we have tasks that go through files in a folder and compile them all manually one by one. But they may have dependencies as well as more than one output. So we generate a file during the compilation that gives the list of inputs and outputs needed for this particular compilation.
The task will compile many such file though we have to mix this data with the input/output changes given to the task to figure out if each file actually need recompilation.
Right now we store this file in build/ which is not a great place for it (but ok). It would be better to store this somewhere else, but this is probably fine for now since we are limited to writing a file in a folder.

- We also have a task that is going to take the content of a whole folder, and process it to generate some output. Here the task will build a model from those files, storing which file contributed to which part of the model, before generating the output. We want to be able to store this model in a faster way to write/read than the original file, so that upon running the task again we can load this, and apply changes based on input file changes.

We might start with something simple like an API that you can ask for a directory that the task can cache stuff in, and what you write to that directory is up to you. It may also support a couple of other things that we've found useful:

* Invalidating the cache when the task implementation version changes.
* Locking the cache.
* Cleaning up after a crash.



thanks
Xavier


On Tue, Dec 4, 2012 at 8:01 PM, Adam Murdoch <[hidden email]> wrote:

On 05/12/2012, at 2:23 PM, Xavier Ducrohet wrote:

Currently plugins don't have access to the list of changed files (input and output) that triggered Gradle to re-run a task.

The Android plugin is adding several steps to the compilation, and we'd like to make sure that we can do them as incremental steps. We could replicate what Gradle already computes but this seems silly as Gradle already has the data.

Any chance this will become part of the official API?

Absolutely. We want to provide something there. We would also use this in Gradle to improve the compile tasks (to fix GRADLE-1501, for example) and make the copy task incremental, and (probably) to fix some long standing bugs with the copy and archive tasks.

There's no concrete plan just yet for how that API would look.

What information would you need from such an API?


We are also looking at some tasks that would require storing some information somewhere to improve speed when running the task incrementally. I see there are some things stored inside .gradle/<version>/taskArtifacts and I was wondering if plugin can get access to this (or just simply write directly to it).

Not yet. The above API might provide something, where you can attach some state to the task, or perhaps to each output file, and have Gradle persist it for you.

What sort of information do you want to store?


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com




--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com

Reply | Threaded
Open this post in threaded view
|

Re: Implementing incremental tasks

Xavier Ducrohet
On Mon, Dec 10, 2012 at 12:41 PM, Adam Murdoch <[hidden email]> wrote:

On 06/12/2012, at 8:08 AM, Xavier Ducrohet wrote:

I think we would need:
- list of input file changed, with the kind of change: added/removed/modified
- list of output file changed which here should only be removed I guess. I mean someone could go an hand-edit some output files manually, but this is the same. Any touched output file in anyway should will trigger a new build.

I'd guess you also need some indication of which task settings have changed.


True. At first, I think it might be easier to have a global "some-non-file-input-changed" flag to trigger a full build.
Incremental is important when working on the files but if a build/tasks setting changes, I think it's perfectly fine to trigger a full run of that particular task.

Regarding tasks storage we have several cases:

- we have tasks that takes all the files from a folder and convert every file into a new file, 1 to 1. Here no storage is needed.

- we have tasks that go through files in a folder and compile them all manually one by one. But they may have dependencies as well as more than one output. So we generate a file during the compilation that gives the list of inputs and outputs needed for this particular compilation.
The task will compile many such file though we have to mix this data with the input/output changes given to the task to figure out if each file actually need recompilation.
Right now we store this file in build/ which is not a great place for it (but ok). It would be better to store this somewhere else, but this is probably fine for now since we are limited to writing a file in a folder.

- We also have a task that is going to take the content of a whole folder, and process it to generate some output. Here the task will build a model from those files, storing which file contributed to which part of the model, before generating the output. We want to be able to store this model in a faster way to write/read than the original file, so that upon running the task again we can load this, and apply changes based on input file changes.

We might start with something simple like an API that you can ask for a directory that the task can cache stuff in, and what you write to that directory is up to you. It may also support a couple of other things that we've found useful:

* Invalidating the cache when the task implementation version changes.
* Locking the cache.
* Cleaning up after a crash.


That would work fine for our needs. thanks!