The natural way to use the KCL
The multi language protocol defines a system for communication between a KCL multi-lang application and another process (referred to as the “child process”) over STDIN and STDOUT of the child process. The units of communication are JSON messages which represent the actions the receiving entity should perform. The child process is responsible for reacting appropriately to four different messages: initialize, processRecords, checkpoint, and shutdown. The KCL multi-lang app is responsible for reacting appropriately to two messages generated by the child process: status and checkpoint.
The child process will be started by the KCL multi-lang application. There will be one child process for each shard that this worker is assigned to. The multi-lang app will send an initialize, processRecords, or shutdown message upon invocation of its corresponding methods. Each message will be on a single line, the messages will be separated by new lines. The child process is expected to read these messages off its STDIN line by line. The child process must respond over its STDOUT with a status message indicating that is has finished performing the most recent action. The multi-lang daemon will not begin to send another message until it has received the response for the previous message.
The child process may send a checkpoint message at any time after receiving a processRecords or shutdown action and before sending the corresponding status message back to the processor. After sending a checkpoint message over STDOUT, the child process is expected to immediately begin to read its STDIN, waiting for the checkpoint result message from the KCL multi-lang processor.
The “data” field of the processRecords action message is an array of arbitrary bytes. To send this in a JSON string we apply base 64 encoding which transforms the byte array into a string (specifically this string doesn’t have JSON special symbols or new lines in it). The multi-lang processor will use the Jackson library which uses a variant of MIME called MIME_NO_LINEFEEDS (see Jackson doc for more details) MIME is the basis of most base64 encoding variants including RFC 3548 which is the standard used by Python’s base64 module.