Configuration and customization¶
This chapter describes how to configure and customize Karton services, including ready-made ones available on PyPi/Github.
Basic configuration¶
Karton services can be configured using various ways. Let’s take a look at basic configuration.
[s3]
secret_key = minioadmin
access_key = minioadmin
address = http://localhost:9000
bucket = karton
[redis]
host=localhost
port=6379
Configuration values are read from various sources using the following precedence:
/etc/karton/karton.ini
file (global)~/.config/karton/karton.ini
file (user local)./karton.ini
file (subsystem local)--config-path <path>
optional, additional path provided in argumentsKARTON_SECTION_OPTION
values from environment variables e.g. (secret_key
option in[s3]
section can be overridden usingKARTON_S3_SECRET_KEY
variable)Command-line arguments (if
Karton.main()
method is used as entrypoint)
You can build your configuration hierarchically e.g. by providing common settings in /etc/karton/karton.ini
, service-specific settings in local ./karton.ini
and secrets in env vars.
Common Karton configuration fields are listed below:
Section |
Option |
Description |
---|---|---|
[s3] |
address |
S3 API address |
[s3] |
access_key |
S3 API access key (username) |
[s3] |
secret_key |
S3 API secret key (password) |
[s3] |
iam_auth |
S3 IAM configuration (default: False) |
[s3] |
bucket |
Default bucket name for storing produced resources |
[redis] |
host |
Redis server hostname |
[redis] |
port |
Redis server port |
[redis] |
db |
Redis server database id (default: 0) |
[redis] |
username |
Redis server AUTH username (default: None) |
[redis] |
password |
Redis server AUTH password (default: None) |
[redis] |
socket_timeout |
Socket timeout for Redis operations in seconds (default: 30, use 0 to turn off if timeout doesn’t work properly) |
[karton] |
identity |
Karton service identity override (overrides the name provided in class / constructor arguments) |
[karton] |
persistent |
Karton service queue persistency override |
[karton] |
debug |
Karton debug mode for service development |
[karton] |
task_timeout |
Karton service task execution timeout in seconds. Useful if your service sometimes hangs. Karton will schedule SIGALRM if this value is set. |
[logging] |
level |
Logging level for Karton service logger (default: INFO) |
[signaling] |
status |
Turns on producing of ‘karton.signaling.status’ tasks, signalling the task start and finish events by Karton service (default: 0, off) |
Note that if both iam_auth = True
and the access_key
, secret_key
pair are provided in the configuration file, Karton will first try to load secrets via IAM provider and
will fallback to the provided pair otherwise. More information about credential loading can be found here.
Karton System configuration¶
Most core services can be tuned depending on your needs. Custom service configuration is handled the same way as general Karton configuration.
Good example is Karton System:
Section |
Option |
Description |
---|---|---|
[system] |
gc_interval |
Spawn interval for garbage collection tasks in seconds. Default is 3 minutes. |
[system] |
task_dispatched_timeout |
Timeout for tasks that are stuck in DISPATCHED state (e.g. Producer crashed during upload of resources). Default is 24 hours. |
[system] |
task_started_timeout |
Timeout for tasks that are stuck in STARTED state (e.g. non-graceful crash of Consumer during task processing). Default is 24 hours. |
[system] |
task_crashed_timeout |
Timeout for removal of crashed tasks. Default is 3 days. |
[system] |
enable_gc |
Enable garbage collection. GC can be turned off if you want to scale up routing using several Karton System instances. |
[system] |
enable_router |
Enable task routing. Routing can be turned off if you want to use dedicated Karton System instance for GC. |
All settings can be set using command-line.
$ karton-system --help
usage: karton-system [-h] [--version] [--config-file CONFIG_FILE] [--identity IDENTITY] [--log-level LOG_LEVEL] [--setup-bucket] [--disable-gc] [--disable-router] [--gc-interval GC_INTERVAL]
[--task-dispatched-timeout TASK_DISPATCHED_TIMEOUT] [--task-started-timeout TASK_STARTED_TIMEOUT] [--task-crashed-timeout TASK_CRASHED_TIMEOUT]
Karton message broker.
options:
-h, --help show this help message and exit
--version show program's version number and exit
--config-file CONFIG_FILE
Alternative configuration path
--identity IDENTITY Alternative identity for Karton service
--log-level LOG_LEVEL
Logging level of Karton logger
--debug Enable debugging mode
--setup-bucket Create missing bucket in S3 storage
--disable-gc Do not run GC in this instance
--disable-router Do not run task routing in this instance
--gc-interval GC_INTERVAL
Garbage collection interval
--task-dispatched-timeout TASK_DISPATCHED_TIMEOUT
Timeout for non-enqueued tasks stuck in Dispatched state (non-graceful shutdown of producer)
--task-started-timeout TASK_STARTED_TIMEOUT
Timeout for non-enqueued tasks stuck in Started state (non-graceful shutdown of consumer)
--task-crashed-timeout TASK_CRASHED_TIMEOUT
Timeout for tasks in Crashed state
Extending configuration¶
During development of your own Karton services you may want to provide your own configuration fields.
All configuration values set in karton.ini
files and KARTON_
envs are available in self.config
object and don’t
require additional definition.
The only thing that needs to be extended is argument parser if you want to use command-line arguments. Fortunately, Karton classes expose dedicated methods for this purpose.
import argparse
from karton import Config, Karton, Task
class SmolKarton(Karton):
identity = "karton.smol"
filters = [{
"type": "smol-tasks"
}]
def process(self, task: Task) -> None:
if self.config.has_option("smol", "how_smol")
how_smol = self.config.getint("smol", "how_smol")
if task.headers["size"] > how_smol:
# Task is not smol enough UwU
return
...
@classmethod
def args_parser(cls) -> argparse.ArgumentParser:
# Remember to call super method to include base arguments
parser = super().args_parser()
parser.add_argument(
"--how-smol",
type=int,
default=cls.GC_INTERVAL,
help="Sets size limit for tasks",
)
return parser
@classmethod
def config_from_args(cls, config: Config, args: argparse.Namespace) -> None:
# Remember to call super method to include base arguments
super().config_from_args(config, args)
config.load_from_dict(
{
"smol": {
"how_smol": args.how_smol,
}
}
)
if __name__ == "__main__":
SmolKarton.main()
args_parser
method exposes the argparse.ArgumentParser
that is used for handling CLI arguments. Values from
argparse are then passed to config_from_args
that maps arguments into sections and options of configuration.
That mechanism allows you to define your own arguments and include these values in the final configuration.
Customizing ready-made Karton services¶
Ready-made Karton services like karton-mwdb-reporter
are coming with a predefined set of filters and emitted headers.
If you want to extend them or override them without forking the whole project, you can simply extend the Karton class
and override things you need.
from karton.mwdb_reporter import MWDBReporter
class CustomMWDBReporter(MWDBReporter):
filters = [
*MWDBReporter.filters,
{"type": "sample", "stage", "my-stage"}
]
if __name__ == "__main__":
CustomMWDBReporter.main()
Warning
It’s recommended to pin to the specific version of service you derive from in case of conflicting changes.