Sylvite

The natural way to use the KCL

Protocol

The multi language protocol defines a system for communication between a KCL multi-lang application and another process (referred to as the “child process”) over STDIN and STDOUT of the child process. The units of communication are JSON messages which represent the actions the receiving entity should perform. The child process is responsible for reacting appropriately to four different messages: initialize, processRecords, checkpoint, and shutdown. The KCL multi-lang app is responsible for reacting appropriately to two messages generated by the child process: status and checkpoint.

Action messages sent to child process

{
  "action": "initialize",
  "shardId": "string"
}
{ "action": "processRecords",
  "records": [
    {
      "data" : "<base64encoded_string>",
      "partitionKey" : "<partition key>",
      "sequenceNumber" : "<sequence number>"
    },
    ...
  ]
}
{
  "action": "checkpoint",
  "checkpoint": "<sequence number>",
  "error": "<NameOfException>"
}
{
  "action" : "shutdown",
  "reason" : "<TERMINATE|ZOMBIE>"
}

Action messages sent to KCL by the child process

{
  "action" : "checkpoint",
  "checkpoint" : "<sequenceNumberToCheckpoint>"
}
{
  "action" : "status",
  "responseFor" : "<nameOfAction>"
}

High Level Description Of Protocol

The child process will be started by the KCL multi-lang application. There will be one child process for each shard that this worker is assigned to. The multi-lang app will send an initialize, processRecords, or shutdown message upon invocation of its corresponding methods. Each message will be on a single line, the messages will be separated by new lines. The child process is expected to read these messages off its STDIN line by line. The child process must respond over its STDOUT with a status message indicating that is has finished performing the most recent action. The multi-lang daemon will not begin to send another message until it has received the response for the previous message.

Checkpointing Behavior

The child process may send a checkpoint message at any time after receiving a processRecords or shutdown action and before sending the corresponding status message back to the processor. After sending a checkpoint message over STDOUT, the child process is expected to immediately begin to read its STDIN, waiting for the checkpoint result message from the KCL multi-lang processor.

Protocol From Child Process Perspective

Initialize

  1. Read an "initialize" action from STDIN
  2. Perform initialization steps
  3. Write "status" message to indicate you are done
  4. Begin reading line from STDIN to receive next action

ProcessRecords

  1. Read a "processRecords" action from STDIN
  2. Perform processing tasks (you may write a checkpoint message at any time)
  3. Write "status" message to STDOUT to indicate you are done.
  4. Begin reading line from STDIN to receive next action

Shutdown

  1. Read a "shutdown" action from STDIN
  2. Perform shutdown tasks (you may write a checkpoint message at any time)
  3. Write "status" message to STDOUT to indicate you are done.
  4. Begin reading line from STDIN to receive next action

Checkpoint

  1. Read a "checkpoint" action from STDIN
  2. Decide whether to checkpoint again based on whether there is an error or not.

Base 64 Encoding

The “data” field of the processRecords action message is an array of arbitrary bytes. To send this in a JSON string we apply base 64 encoding which transforms the byte array into a string (specifically this string doesn’t have JSON special symbols or new lines in it). The multi-lang processor will use the Jackson library which uses a variant of MIME called MIME_NO_LINEFEEDS (see Jackson doc for more details) MIME is the basis of most base64 encoding variants including RFC 3548 which is the standard used by Python’s base64 module.