Good design to expose debug info from kernel module

Sun Mar 29 00:14:19 IDT 2015

I've been bitten by text based serialization formats:

0) You have to write a parser/printer for each new protocol.

1) Separating messages in a stream is trickier with text based formats:

    int rv = read(fd, &len, sizeof(len));
    assert(rv > 0);
    rv = read(len, buf, sizeof(buf)));
    assert(rv > 0);

instead of (without handling errors):

    int len = 0;
    while ((eol = memchr(buf, '\n', len)) == NULL) {
        int n = read(buf + len, sizeof(buf)-len);
        assert(n > 0);
        len += n;
    }
    char *leftover = eol+1;
    int leftover_len = len-(eol+1-buf);
    // keep leftover, and add it to next read

2) Text based serialization are hard to parse. It's nice that their
readable from the terminal, but whenever I've had to analyze it a bit more,
I had to resort to crazy regexp. Instead of abstract description, here is a
real problem I was trying to solve. Sketch when did the machine context
switched lately. With dtrace, it was a few minute work, as I had the event
data parsed[0]. When using Linux, I had to trace ftrace's textual output,
which took time to get right, AND, was still fragile, as weird file names
could potentially break it[1].

3) While for simple protocols text based semi-protocol could work, my
experience is, when you have more complex requests (say, more than one
field, and variable length array), it becomes a burden. Especially if you
want to keep backwards compatibility. Using standard makes both consuming
and producing requests and response almost effortless.

4) I really think nowadays text is a bit obsolete. I can hardly think of a
case where text would be more convenient than, say, json or HTML. Web
browser is at your fingertips, and HTML table is easier to handle than
whitespace separated output based on your terminal size.
<tr><td>eth0</td><td>100</td></tr> is just as grep'able as the tab
separated version, and is easier to view, sort, etc.

[0] https://github.com/elazarl/cpu_affinity/blob/master/tracecpu.d
[1]
https://github.com/elazarl/cpu_affinity/blob/master/test/linux/plot_ftrace_sched_switch.py

On Sat, Mar 28, 2015 at 12:49 AM, Amos Shapira <amos.shapira at gmail.com>
wrote:

> If serialisation (aka "marshalling") is considered, how about making it
> text based?
> Then you can use simple shell tools to talk to it.
>
>
> On 27 March 2015 at 22:34, Elazar Leibovich <elazarl at gmail.com> wrote:
>
>> IMHO, C structs are no way near as usable as proper serialization
>> format. For example, what about optional fields? What about variable
>> length array? What about binary backwards compatibility? What about
>> supporting other languages? It's not trivial to take a C struct and
>> generate the proper struct.unpack string for it.
>>
>> Look at the complexity in perf_event_open(2), just parsing the event
>> stream takes a good chunk of code[0], with many potential bugs.
>> Parsing it with protobuf (or one of the other serialization formats)
>> would take three lines or so, would be more efficient, and would be
>> easier to program against, and less prone to bugs, etc.
>>
>> [0] Here is my take, and it's not even complete
>> https://gist.github.com/elazarl/c8404686e71ef0b36cc7
>>
>> On Fri, Mar 27, 2015 at 12:26 PM, guy keren <guy.choo.keren at gmail.com>
>> wrote:
>> >
>> > i imagine, if you use the proper 'packing' pragmas, you can simply
>> mempcy
>> > structures, without really writing serialization code (there's no
>> endianess
>> > issues, with both sides running on the same host, by definition).
>> >
>> > --guy
>> >
>> >
>> > On 03/27/2015 10:03 AM, Elazar Leibovich wrote:
>> >>
>> >> Thanks, didn't know netlink.
>> >>
>> >> You still need a solution to parse the sent message, where protocol
>> >> buffers etc, can help. (e.g., binary data into struct
>> >> mymodule_request).
>> >>
>> >> Or am I missing something?
>> >>
>> >> On Fri, Mar 27, 2015 at 3:33 AM, guy keren <guy.choo.keren at gmail.com>
>> >> wrote:
>> >>>
>> >>>
>> >>> take a look at this:
>> >>>
>> >>>
>> >>>
>> http://www.linuxfoundation.org/collaborate/workgroups/networking/generic_netlink_howto
>> >>>
>> >>> (link got broken - place it all on a single line)
>> >>>
>> >>> --guy
>> >>>
>> >>>
>> >>> On 03/26/2015 11:36 PM, Elazar Leibovich wrote:
>> >>>>
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I'm writing a kernel module, and I want to expose some debug
>> >>>> information about it.
>> >>>>
>> >>>> The debug information is often of the form of request-response.
>> >>>>
>> >>>> For example:
>> >>>>
>> >>>> - Hey module, what's up with data at 0xffffe8ff0040c000?
>> >>>> - Cached, populated two hours ago.
>> >>>>
>> >>>> - Hey module, please invalidate data at 0xffffe8ff0002cb00
>> >>>> - Sure thing.
>> >>>>
>> >>>> - Hey module, please record all accesses to 0xffffe8ff0006bbf0.
>> >>>> - OK, ask me again for stats-5
>> >>>> ...
>> >>>> - Hey module, what's in stats-5?
>> >>>> - So far, 41 accesses by 22 users.
>> >>>>
>> >>>> Now, the question is, what is a good design to expose this
>> information.
>> >>>>
>> >>>> I think that the most reasonable way to interact with userspace is
>> >>>> through a debugfs file.
>> >>>>
>> >>>> The user would open the debugfs file in read+write mode, would write
>> a
>> >>>> request, and accept a response from it.
>> >>>>
>> >>>> As I see it, there are two fundamental problems needs to be solved:
>> >>>>
>> >>>> - Parsing the request from the client.
>> >>>> - Writing the response in a recognizeable format.
>> >>>>
>> >>>> A simple solution I first came up with, is to use a ad-hoc
>> >>>> request-response format. In my case, request and response are line
>> >>>> delimited, request is a hex address, and response is a translated hex
>> >>>> address.
>> >>>>
>> >>>> Here is the relevant snippet.
>> >>>>
>> >>>> struct pipe {
>> >>>>          DECLARE_KFIFO(fifo, T, (1<<4));
>> >>>>          wait_queue_head_t queue;
>> >>>>          char buf[100];
>> >>>>          int buflen;
>> >>>>          char resp[100];
>> >>>>          int resp_len;
>> >>>> };
>> >>>> static DEFINE_MUTEX(mutex);
>> >>>> static int open(struct inode *inode, struct file *file)
>> >>>> {
>> >>>>       struct pipe *pipe;
>> >>>>       if (!(file->f_mode & FMODE_READ) || !(file->f_mode &
>> FMODE_READ))
>> >>>> {
>> >>>>           pr_warn("must open with O_RDWR\n");
>> >>>>           return -EINVAL;
>> >>>>       }
>> >>>>       mutex_lock(&mutex);
>> >>>>       pipe = kzalloc(sizeof(*pipe), GFP_KERNEL);
>> >>>>       INIT_KFIFO(pipe->fifo);
>> >>>>       init_waitqueue_head(&pipe->queue);
>> >>>>       file->private = pipe;
>> >>>> }
>> >>>>
>> >>>> static int write(struct file *file, const char __user *ubuf, size_t
>> >>>> count, loff_t *ppos)
>> >>>> {
>> >>>>       char *eol;
>> >>>>       size_t n = min_t(size_t, count, sizeof(pipe->buf));
>> >>>>       struct pipe *pipe = file->private_data;
>> >>>>       if (copy_from_user(&pipe->buf[pipe->buflen], ubuf, n)
>> >>>>           return -EFAULT;
>> >>>>       eol = memchr(buf, '\n', n);
>> >>>>       if (eol == NULL)
>> >>>>           return count;
>> >>>>       *eol = '\0';
>> >>>>       // TODO: wait when queue full
>> >>>>       if (!kfifo_in(&pipe->fifo, processLine(buf), 1)
>> >>>>           return -EFAULT;
>> >>>>       wake_up_interruptible(&pipe->queue);
>> >>>>       memmove(&pipe->buf[0], &pipe->buf[n], pipe->buflen-n);
>> >>>> }
>> >>>>
>> >>>> static int read(struct file *file, const char __user *ubuf, size_t
>> >>>> count, loff_t *ppos)
>> >>>> {
>> >>>>       struct pipe *pipe = file->private_data;
>> >>>>       T req;
>> >>>>       wait_event_interruptible(pipe->queue, kfifo_out(&pipe->fifo,
>> &req,
>> >>>> 1));
>> >>>>       process_request(req, &pipe->resp, &pipe->resp_len);
>> >>>>       if (count < pipe->resp_len)
>> >>>>           return -EFAULT; // TODO: handle copy to client in parts
>> >>>>       if (copy_to_user(userbuf, buf, pipe->resp_len))
>> >>>>           return -EFAULT;
>> >>>> }
>> >>>>
>> >>>> Usage is:
>> >>>>
>> >>>> fd = io.FileIO("/debug/mymodule/file", "r+")
>> >>>> fd.write('req...')
>> >>>> print fd.read(100)
>> >>>>
>> >>>> This is not so robust, for many reasons (look how many bugs are in
>> >>>> this small and simple snippet), and some parts need to be repeated
>> for
>> >>>> each input type.
>> >>>>
>> >>>> What I've had in mind, in similar fashion to grpc.io, have the user
>> >>>> write a size prefixed protocol buffer object to the file, and
>> >>>> similarly read it as a response.
>> >>>>
>> >>>> Something like:
>> >>>>
>> >>>> fd = io.FileIO("/debug/mymodule/file", "r+")
>> >>>> fd.write(myReq.SerializeToString())
>> >>>> len = struct.unpack("<i", fd.read(4))
>> >>>> Resp.ParseFromString(fd.read(len))
>> >>>>
>> >>>> I believe it is not hard to create a kernel compatible protocol
>> buffer
>> >>>> code generator.
>> >>>>
>> >>>> When you have this in place, you have to write a very simple logic to
>> >>>> add a new functionality to the debugfs file. Handler would
>> essentially
>> >>>> get pointers to a request struct, and a response struct, and would
>> >>>> need to fill out the response struct.
>> >>>>
>> >>>> Are there similar solutions?
>> >>>> What problems might my approach cause?
>> >>>> Is there a better idea for this problem altogether?
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> _______________________________________________
>> >>>> Linux-il mailing list
>> >>>> Linux-il at cs.huji.ac.il
>> >>>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>> >>>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Linux-il mailing list
>> >>> Linux-il at cs.huji.ac.il
>> >>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>> >
>> >
>>
>> _______________________________________________
>> Linux-il mailing list
>> Linux-il at cs.huji.ac.il
>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>
>
>
>
> --
> <http://au.linkedin.com/in/gliderflyer>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20150329/6cf6819b/attachment.html>