Namespace Pulumi.Gcp.Dataflow
Classes
Job
Creates a job on Dataflow, which is an implementation of Apache Beam running on Google Compute Engine. For more information see the official documentation for Beam and Dataflow.
Example Usage
using Pulumi;
using Gcp = Pulumi.Gcp;
class MyStack : Stack
{
public MyStack()
{
var bigDataJob = new Gcp.Dataflow.Job("bigDataJob", new Gcp.Dataflow.JobArgs
{
Parameters =
{
{ "baz", "qux" },
{ "foo", "bar" },
},
TempGcsLocation = "gs://my-bucket/tmp_dir",
TemplateGcsPath = "gs://my-bucket/templates/template_file",
});
}
}
Note on "destroy" / "apply"
There are many types of Dataflow jobs. Some Dataflow jobs run constantly, getting new data from (e.g.) a GCS bucket, and outputting data continuously. Some jobs process a set amount of data then terminate. All jobs can fail while running due to programming errors or other issues. In this way, Dataflow jobs are different from most other Google resources.
The Dataflow resource is considered 'existing' while it is in a nonterminal state. If it reaches a terminal state (e.g. 'FAILED', 'COMPLETE', 'CANCELLED'), it will be recreated on the next 'apply'. This is as expected for jobs which run continuously, but may surprise users who use this resource for other kinds of Dataflow jobs.
A Dataflow job which is 'destroyed' may be "cancelled" or "drained". If "cancelled", the job terminates - any data written remains where it is, but no new data will be processed. If "drained", no new data will enter the pipeline, but any data currently in the pipeline will finish being processed. The default is "cancelled", but if a user sets on_delete to "drain" in the configuration, you may experience a long wait for your pulumi destroy to complete.