REALTIMEKIT Realtime Policy and Watchdog Daemon GIT: git://git.0pointer.de/rtkit.git GITWEB: http://git.0pointer.de/?p=rtkit.git NOTES: RealtimeKit is a D-Bus system service that changes the scheduling policy of user processes/threads to SCHED_RR (i.e. realtime scheduling mode) on request. It is intended to be used as a secure mechanism to allow real-time scheduling to be used by normal user processes. RealtimeKit enforces strict policies when handing out real-time security to user threads: * Only clients with RLIMIT_RTTIME set will get RT scheduling * RT scheduling will only be handed out to processes with SCHED_RESET_ON_FORK set to guarantee that the scheduling settings cannot 'leak' to child processes, thus making sure that 'RT fork bombs' cannot be used to bypass RLIMIT_RTTIME and take the system down. * Limits are enforced on all user controllable resources, only a maximum number of users, processes, threads can request RT scheduling at the same time. * Only a limited number of threads may be made RT in a specific time frame. * Client authorization is verified with PolicyKit RealtimeKit can also be used to hand outh high priority scheduling (i.e. negative nice level) to user processes. In addition to this a-priori policy enforcement, RealtimeKit also provides a-posteriori policy enforcement, i.e. it includes a canary-based watchdog that automatically demotes all real-time threads to SCHED_OTHER should the system overload despite the logic pointed out above. For more information regarding canary-based RT watchdogs, see the Acknowledgments section below. In its duty to manage real-time scheduling *securely* RealtimeKit runs as unpriviliged user, and uses capabalities, resource limits and chroot() to minimize its security impact. RealtimeKit probably has little use in embedded or server use cases, use RLIMIT_RTPRIO tehre instead. WHY: If processes that have real-time scheduling privileges enter a busy loop they can freeze the entire the system. To make sure such run-away processes cannot do this RLIMIT_RTTIME has been introduced. Being a per-process limit it is however easily cirumvented by combining a fork bomb with a busy loop. RealtimeKit hands out RT scheduling to specific threads that ask for it -- but only to those and due to SCHED_RESET_ON_FORK it can be sure that this won't 'leak'. In contrast to RLIMIT_RTPRIO the RealtimeKit logic makes sure that only a certain number of threads can be made realtime, per user, per process and per time interval. CLIENTS: To be able to make use of realtime scheduling clients may request so with a small D-Bus interface that is accessible on the interface org.freedesktop.RealtimeKit1 as object /org/freedesktop/RealtimeKit1 on the service org.freedesktop.RealtimeKit1: void MakeThreadRealtime(u64 thread_id, u32 priority); void MakeThreadHighPriority(u64 thread_id, s32 priority); The thread IDs need to be passed as kernel tids as returned by gettid(), not a pthread_t! Only threads belonging to the calling process can be made realtime/high priority. (Please note that gettid() is not available in glibc, you need to implement that manually using syscall(). Consult the reference client implementation for details.) A BSD-licensed reference implementation of the client is available in rtkit.[ch] as part of the package. You may copy this into your sources if you wish. However given how simple the D-Bus interface is you might choose to implement your own client implementation. It is advisable to try acquiring realtime scheduling with sched_setsheduler() first, so that systems where RLIMIT_RTPRIO is set can be supported. Here's an example using the reference implementation. Replace this: struct sched_param p; memset(&p, 0, sizeof(p)); p.sched_priority = 3; sched_setscheduler(0, SCHED_RR|SCHED_RESET_ON_FORK, &p); by this: struct sched_param p; memset(&p, 0, sizeof(p)); p.sched_priority = 3; if (sched_setscheduler(0, SCHED_RR|SCHED_RESET_ON_FORK, &p) < 0 && errno == EPERM) rtkit_make_realtime(system_bus, 0, p.sched_priority); But of course add more appropriate error checking! Also, falling back to plain SCHED_RR when SCHED_RESET_ON_FORK causes EINVAL migt be advisable). DAEMON: The daemon is automatically started on first use via D-Bus system bus activation. Currently the daemon does not read on any configuration file, however it can be configured with command line parameters. You can edit /usr/share/dbus-1/system-services/org.freedesktop.RealtimeKit1.service to set those. Run /usr/libexec/rtkit-daemon --help to get a quick overview on the supported parameters and their defaults. Many of them should be obvious in their meaning. For the remaining ones see below: --max-realtime-priority= may be used to specify the maximum realtime priority a client can acquire through RealtimeKit. Please note that this value must be smaller than the value passed to --our-realtime-priority=. --our-realtime-priority= may be used to specify the realtime priority of the daemon itself. Please note that this priority is only used for a very short time while processing a client request. Normally the daemon will not be running with a realtime scheduling policy. The real-time priorities handed out to the user must be lower than this value. (see above). --min-nice-level= may be used to specify the minimum nice level a client can acquire through RealtimeKit. --our-nice-level= may be used to specify the nice level the the daemon itself uses most of the time (except when processing requests, see above). It is probably a good idea to set this to a small positive value, to make sure that if the system is overloaded already handing out further RT scheduling will be delayed a bit. --rttime-usec-max= may be used to control which RLIMIT_RTTIME value clients need to have chosen at minumum before they may acquire RT scheduling through RealtimeKit. --users-max= specifies how many users may acquire RT scheduling at the same time for one or multiple of their processes. --processes-per-user-max= specifies how many processes per user may acquire RT scheduling at the same time. --threads-per-user-max= specifies how many threads per user may acquire RT scheduling at the same time. Of course this value should be set higher than --process-per-user-max=. --actions-burst-sec= may be used to influence the rate limiting logic in RealtimeKit. The daemon will only pass realtime scheduling privileges to a maximum number of threads within this timeframe (see below). --actions-per-burst-max= may be used to influence the rate limiting logic in RealtimeKit. The daemon will only pass realtime scheduling privileges to this number of threads within the time frame configured via --actions-burst-sec=. When this limit is reached clients need to wait until that time passes before requesting RT scheduling again. --canary-cheep-msec= may be used to control how often the canary thread shall cheep. --canary-watchdog-msec= may be used to control how quickly the watchdog thread expects to receive a cheep from the canary thread. This value must be chosen larger than --canary-cheep-msec=. If the former is set 10s and the latter to 7s, then the canary thread can trigger and deliver the cheep with a maximum latency of 3s. ACKNOWLEDGMENTS: The canary watchdog logic is inspired by previous work of Vernon Mauery, Florian Schmidt, Kjetil Matheussen: http://rt.wiki.kernel.org/index.php/RT_Watchdog LICENSE: GPLv3+ for the daemon BSD for the client reference implementation AUTHOR: Lennart Poettering REQUIREMENTS: Linux kernel >= 2.6.31 D-Bus PolicyKit >= 0.92