Sylvite

The natural way to use the KCL

Configuration

As Sylvite is a wrapper around the KCL, all of the KCL configuration is supported.

Configuration values can be set via either environment variables of via a configuration file. See the usage page for examples.

Required Configuration

The following settings are required.

APPLICATION_NAME (applicationName)

Type: string

Used by the KCL as the name of this application. Will be used as the name of a Amazon DynamoDB table which will store the lease and checkpoint information for workers with this application name.
EXECUTABLE_NAME (executableName)

Type: string

The child executable to spawn for each shard. The executable must speak the KCL multi-lang protocol. All environment variables passed to the parent process will be inherited by the child process.
STREAM_NAME (streamName)

Type: string

The Kinesis stream to process.

Optional Configuration

The following settings are optional and all have default values if not set.

AWS_CREDENTIALS_PROVIDER (awsCredentialsProvider)

Type: string

Default: DefaultAWSCredentialsProviderChain

The default credential provider used to sign AWS requests.
CALL_PROCESS_RECORDS_EVENFOR_EMPTY_RECORD_LIST (callProcessRecordsEvenForEmptyRecordList)

Type: boolean

Default:false

Call the RecordProcessor::processRecords() API even if GetRecords returned an empty record list.
CLEANUP_LEASES_UPON_SHARD_COMPLETION (cleanupLeasesUponShardCompletion)

Type: boolean

Default: true

Cleanup leases upon shards completion (don't wait until they expire in Kinesis). Keeping leases takes some tracking/resources (e.g. they need to be renewed, assigned), so by default we try to delete the ones we don't need any longer.
CLOUD_WATCH_CREDENTIALS_PROVIDER (cloudWatchCredentialsProvider)

Type: string

Default: DefaultAWSCredentialsProviderChain

AWSCredentialsProvider for the cloudwatch access. Should only be set if you want to override the default AWS_CREDENTIALS_PROVIDER.
DYNAMO_DB_CREDENTIALS_PROVIDER (dynamoDBCredentialsProvider)

Type: string

Default: DefaultAWSCredentialsProviderChain

AWSCredentialsProvider for the dynamodb access. Should only be set if you want to override the default AWS_CREDENTIALS_PROVIDER.
FAILOVER_TIME_MILLIS (failoverTimeMillis)

Type: integer

Default: 10000

Fail over time in milliseconds. A worker which does not renew it's lease within this time interval will be regarded as having problems and it's shards will be assigned to other workers. For applications that have a large number of shards, this may be set to a higher number to reduce the number of DynamoDB IOPS required for tracking leases.
IDLE_TIME_BETWEEN_READS_IN_MILLIS (idleTimeBetweenReadsInMillis)

Type: integer

Default: 1000

Idle time between calls to fetch data from Kinesis. This should be tuned with MAX_RECORDS in order to ensure you are not falling behind.
INITIAL_LEASE_TABLE_READ_CAPACITY (initialLeaseTableReadCapacity)

Type: integer

Default: 10

The Amazon DynamoDB table used for tracking leases will be provisioned with this read capacity. Only applies if the table does not exist, otherwise the capacity is not changed.
INITIAL_LEASE_TABLE_WRITE_CAPACITY (initialLeaseTableWriteCapacity)

Type: integer

Default: 10

The Amazon DynamoDB table used for tracking leases will be provisioned with this write capacity. Only applies if the table does not exist, otherwise the capacity is not changed.
INITIAL_POSITION_IN_STREAM (streamName)

Type: one of [LATEST, TRIM_HORIZON]

Default: TRIM_HORIZON

One of LATEST or TRIM_HORIZON. The Amazon Kinesis Client Library will start fetching records from this position when the application starts up if there are no checkpoints. If there are checkpoints, it will process records from the checkpoint position.
KINESIS_CREDENTIALS_PROVIDER (kinesisCredentialsProvider)

Type: string

Default: DefaultAWSCredentialsProviderChain

AWSCredentialsProvider for the kinesis access. Should only be set if you want to override the default AWS_CREDENTIALS_PROVIDER.
MAX_ACTIVE_THREADS (maxActiveThreads)

Type: integer

Default: 0

The maximum number of threads the multi-lang daemon will use. The default value of 0 does not limit the number of threads and should only be changed if you really know what you're doing.
MAX_LEASES_FOR_WORKER (maxLeasesForWorker)

Type: integer

Default: 2,147,483,647

The max number of leases (shards) this worker should process. This can be useful to avoid overloading (and thrashing) a worker when a host has resource constraints or during deployment. NOTE: Setting this to a low value can cause data loss if workers are not able to pick up all shards in the stream due to the max limit.
MAX_LEASES_TO_STEAL_AT_ONE_TIME (maxLeasesToStealAtOneTime)

Type: integer

Default: 1

Max leases to steal from a more loaded Worker at one time (for load balancing). Setting this to a higher number can allow for faster load convergence (e.g. during deployments, cold starts), but can cause higher churn in the system.
MAX_RECORDS (maxRecords)

Type: integer

Default: 10000

Max records to fetch in a Kinesis getRecords() call. This should be tuned with IDLE_TIME_BETWEEN_READS_IN_MILLIS in order to ensure you are not falling behind.
METRICS_BUFFER_TIME_MILLIS (metricsBufferTimeMillis)

Type: integer

Default: 10000

Metrics are buffered for at most this long before publishing to CloudWatch.
METRICS_ENABLED_DIMENSIONS (metricsEnabledDimensions)

Type: string

Default: null

Sets the dimensions that are allowed to be emitted in metrics.
METRICS_LEVEL (metricsLevel)

Type: one of [NONE, SUMMARY, DETAILED]

Default: DETAILED

Sets metrics level that should be enabled. Possible values are:
  • NONE
  • SUMMARY
  • DETAILED
METRICS_MAX_QUEUE_SIZE (metricsMaxQueueSize)

Type: integer

Default: 10000

Max number of metrics to buffer before publishing to CloudWatch.
PARENT_SHARD_POLL_INTERVAL_MILLIS (parentShardPollIntervalMillis)

Type: integer

Default: 10000

Interval in milliseconds between polling to check for parent shard completion. Polling frequently will take up more DynamoDB IOPS (when there are leases for shards waiting on completion of parent shards).
PROCESSING_LANGUAGE (processingLanguage)

Type: string

Default: null

The language you are using to process the stream. This has no purpose other than augmenting the multi-lang user-agent string.
REGION_NAME (regionName)

Type: string

Default: null

The aws region name for the service.
SHARD_SYNC_INTERVAL_MILLIS (shardSyncIntervalMillis)

Type: integer

Default: 60000

Shard sync interval in milliseconds - e.g. wait for this long between shard sync tasks.
TASK_BACKOFF_TIME_MILLIS (taskBackoffTimeMillis)

Type: integer

Default: 500

Backoff time in milliseconds for Amazon Kinesis Client Library tasks (in the event of failures).
USER_AGENT (userAgent)

Type: string

Default: null

Override the default user-agent used in aws requests.
VALIDATE_SEQUENCE_NUMBER_BEFORE_CHECKPOINTING (validateSequenceNumberBeforeCheckpointing)

Type: boolean

Default: true

Whether KCL should validate client provided sequence numbers with a call to Amazon Kinesis before actually checkpointing. If true, this calls a kinesis getIterator() api call, and may cause throttling errors if you are checkpointing frequently.
WORKER_ID (workerId)

Type: string

Default: null

Explicit worker id for the given worker, used to distinguish different workers/processes of a Kinesis application. These must be unique between workers.