18 Dec, 2007

1 commit


28 Nov, 2007

1 commit

  • Enable expensive bitmap scanning only if DEBUG option is enabled.
    The bitmap scanning quite loads the CPU and on my machine the write
    throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync
    improves from 37 MB/s to 45.4 MB/s in local mode...

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     

13 Nov, 2007

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (21 commits)
    [CIFS] fix oops on second mount to same server when null auth is used
    [CIFS] Fix stale mode after readdir when cifsacl specified
    [CIFS] add mode to acl conversion helper function
    [CIFS] Fix incorrect mode when ACL had deny access control entries
    [CIFS] Add uid to key description so krb can handle user mounts
    [CIFS] Fix walking out end of cifs dacl
    [CIFS] Add upcall files for cifs to use spnego/kerberos
    [CIFS] add OIDs for KRB5 and MSKRB5 to ASN1 parsing routines
    [CIFS] Register and unregister cifs_spnego_key_type on module init/exit
    [CIFS] implement upcalls for SPNEGO blob via keyctl API
    [CIFS] allow cifs_calc_signature2 to deal with a zero length iovec
    [CIFS] If no Access Control Entries, set mode perm bits to zero
    [CIFS] when mount helper missing fix slash wrong direction in share
    [CIFS] Don't request too much permission when reading an ACL
    [CIFS] enable get mode from ACL when cifsacl mount option specified
    [CIFS] ACL support part 8
    [CIFS] acl support part 7
    [CIFS] acl support part 6
    [CIFS] acl support part 6
    [CIFS] remove unused funtion compile warning when experimental off
    ...

    Linus Torvalds
     

03 Nov, 2007

1 commit

  • Add routines to handle upcalls to userspace via keyctl for the purpose
    of getting a SPNEGO blob for a particular uid and server combination.

    Clean up the Makefile a bit and set it up to only compile cifs_spnego
    if CONFIG_CIFS_UPCALL is set. Also change CONFIG_CIFS_UPCALL to depend
    on CONFIG_KEYS rather than CONFIG_CONNECTOR.

    cifs_spnego.h defines the communications between kernel and userspace
    and is intended to be shared with userspace programs.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

31 Oct, 2007

1 commit


20 Oct, 2007

1 commit

  • The jbd-debug file used to be located in /proc/sys/fs/jbd-debug, but
    create_proc_entry() does not do lookups on file names that are more that
    one directory deep. This causes the entry creation to fail and hence, no
    proc file is created.

    Instead of fixing this on procfs might as well move the jbd2-debug file to
    debugfs which would be the preferred location for this kind of tunable.
    The new location is now /sys/kernel/debug/jbd/jbd-debug.

    [akpm@linux-foundation.org: zillions of cleanups]
    Signed-off-by: Jose R. Santos
    Acked-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jose R. Santos
     

18 Oct, 2007

1 commit

  • In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
    regardless of whether it is in use. This is this the most time consuming part
    of the filesystem check. The unintialized block group feature can greatly
    reduce e2fsck time by eliminating checking of uninitialized inodes.

    With this feature, there is a a high water mark of used inodes for each block
    group. Block and inode bitmaps can be uninitialized on disk via a flag in the
    group descriptor to avoid reading or scanning them at e2fsck time. A checksum
    of each group descriptor is used to ensure that corruption in the group
    descriptor's bit flags does not cause incorrect operation.

    The feature is enabled through a mkfs option

    mke2fs /dev/ -O uninit_groups

    A patch adding support for uninitialized block groups to e2fsprogs tools has
    been posted to the linux-ext4 mailing list.

    The patches have been stress tested with fsstress and fsx. In performance
    tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
    linearly with the total number of inodes in the filesytem. In ext4 with the
    uninitialized block groups feature, the e2fsck time is constant, based
    solely on the number of used inodes rather than the total inode count.
    Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
    greatly reduce e2fsck time for users. With performance improvement of 2-20
    times, depending on how full the filesystem is.

    The attached graph shows the major improvements in e2fsck times in filesystems
    with a large total inode count, but few inodes in use.

    In each group descriptor if we have

    EXT4_BG_INODE_UNINIT set in bg_flags:
    Inode table is not initialized/used in this group. So we can skip
    the consistency check during fsck.
    EXT4_BG_BLOCK_UNINIT set in bg_flags:
    No block in the group is used. So we can skip the block bitmap
    verification for this group.

    We also add two new fields to group descriptor as a part of
    uninitialized group patch.

    __le16 bg_itable_unused; /* Unused inodes count */
    __le16 bg_checksum; /* crc16(sb_uuid+group+desc) */

    bg_itable_unused:

    If we have EXT4_BG_INODE_UNINIT not set in bg_flags
    then bg_itable_unused will give the offset within
    the inode table till the inodes are used. This can be
    used by fsck to skip list of inodes that are marked unused.

    bg_checksum:
    Now that we depend on bg_flags and bg_itable_unused to determine
    the block and inode usage, we need to make sure group descriptor
    is not corrupt. We add checksum to group descriptor to
    detect corruption. If the descriptor is found to be corrupt, we
    mark all the blocks and inodes in the group used.

    Signed-off-by: Avantika Mathur
    Signed-off-by: Andreas Dilger
    Signed-off-by: Mingming Cao
    Signed-off-by: Aneesh Kumar K.V

    Andreas Dilger
     

17 Oct, 2007

4 commits

  • Turn Network File Systems into a menuconfig so that it can be disabled at
    once.

    (Note: I added a "default y". If you do not like that, speak up.)

    Signed-off-by: Jan Engelhardt
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Steven French
    Cc: David Howells
    Cc: Eric Van Hensbergen
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Engelhardt
     
  • Implement sending of quota messages via netlink interface. The advantage
    is that in userspace we can better decide what to do with the message - for
    example display a dialogue in your X session or just write the message to
    the console. As a bonus, we can get rid of problems with console locking
    deep inside filesystem code once we remove the old printing mechanism.

    Signed-off-by: Jan Kara
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Since CONFIG_RAMFS is currently hard-selected to "y", and since
    Documentation/filesystems/ramfs-rootfs-initramfs.txt reads as follows:

    "The amount of code required to implement ramfs is tiny, because all the
    work is done by the existing Linux caching infrastructure. Basically,
    you're mounting the disk cache as a filesystem. Because of this, ramfs is
    not an optional component removable via menuconfig, since there would be
    negligible space savings."

    It seems pointless to leave this as a Kconfig entry.

    Signed-off-by: Robert P. J. Day
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • Allow disabling DNOTIFY with CONFIG_EMBEDDED=n.

    I'm currently running a kernel with dnotify disabled and I haven't run into
    any problem. Is there any popular application left that breaks without
    dnotify support in the kernel?

    Note that this patch does not remove dnotify support, it still defaults to
    "y", and the help text recommends enabling it.

    Signed-off-by: Adrian Bunk
    Acked-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

16 Oct, 2007

1 commit

  • * git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
    NFSv4: Fix a typo in nfs_inode_reclaim_delegation
    NFS: Add a boot parameter to disable 64 bit inode numbers
    NFS: nfs_refresh_inode should clear cache_validity flags on success
    NFS: Fix a connectathon regression in NFSv3 and NFSv4
    NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
    SUNRPC: Don't call xprt_release in call refresh
    SUNRPC: Don't call xprt_release() if call_allocate fails
    SUNRPC: Fix buggy UDP transmission
    [23/37] Clean up duplicate includes in
    [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
    SUNRPC: Use correct type in buffer length calculations
    SUNRPC: Fix default hostname created in rpc_create()
    nfs: add server port to rpc_pipe info file
    NFS: Get rid of some obsolete macros
    NFS: Simplify filehandle revalidation
    NFS: Ensure that nfs_link() returns a hashed dentry
    NFS: Be strict about dentry revalidation when doing exclusive create
    NFS: Don't zap the readdir caches upon error
    NFS: Remove the redundant nfs_reval_fsid()
    NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
    ...

    Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
    net/sunrpc/xprtsock.c

    Linus Torvalds
     

13 Oct, 2007

1 commit


10 Oct, 2007

2 commits


12 Sep, 2007

1 commit


02 Aug, 2007

1 commit


23 Jul, 2007

1 commit


20 Jul, 2007

1 commit


18 Jul, 2007

2 commits

  • The jbd2-debug file used to be located in /proc/sys/fs/jbd2-debug, but it
    incorrectly used create_proc_entry() instead of the sysctl routines, and
    no proc entry was ever created.

    Instead of fixing this we might as well move the jbd2-debug file to
    debugfs which would be the preferred location for this kind of tunable.
    The new location is now /sys/kernel/debug/jbd2/jbd2-debug.

    Signed-off-by: Jose R. Santos
    Signed-off-by: "Theodore Ts'o"

    Jose R. Santos
     
  • Select rpcsec_gss support whenever asked for NFSv4 support. The rfc actually
    requires gss, and gss is also the main reason to migrate to v4. We already do
    this on the client side.

    Signed-off-by: "J. Bruce Fields"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

17 Jul, 2007

1 commit

  • * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (68 commits)
    sh: sh-rtc support for SH7709.
    sh: Revert __xdiv64_32 size change.
    sh: Update r7785rp defconfig.
    sh: Export div symbols for GCC 4.2 and ST GCC.
    sh: fix race in parallel out-of-tree build
    sh: Kill off dead mach.c for hp6xx.
    sh: hd64461.h cleanup and added comments.
    sh: Update the alignment when 4K stacks are used.
    sh: Add a .bss.page_aligned section for 4K stacks.
    sh: Don't let SH-4A clobber SH-4 CFLAGS.
    sh: Add parport stub for SuperIO ports.
    sh: Drop -Wa,-dsp for DSP tuning.
    sh: Update dreamcast defconfig.
    fb: pvr2fb: A few more __devinit annotations for PCI.
    fb: pvr2fb: Fix up section mismatch warnings.
    sh: Select IPR-IRQ for SH7091.
    sh: Correct __xdiv64_32/div64_32 return value size.
    sh: Fix timer-tmu build for SH-3.
    sh: Add cpu and mach links to CLEAN_FILES.
    sh: Preliminary support for the SH-X3 CPU.
    ...

    Linus Torvalds
     

15 Jul, 2007

1 commit

  • This patchset moves non-filesystem interfaces of v9fs from fs/9p to net/9p.
    It moves the transport, packet marshalling and connection layers to net/9p
    leaving only the VFS related files in fs/9p. This work is being done in
    preparation for in-kernel 9p servers as well as alternate 9p clients (other
    than VFS).

    Signed-off-by: Latchesar Ionkov
    Signed-off-by: Eric Van Hensbergen

    Latchesar Ionkov
     

11 Jul, 2007

3 commits

  • Add a "favourlzo" compression mode to jffs2 which tries to
    optimise by size but gives lzo an advantage when comparing sizes.
    This means the faster lzo algorithm can be preferred when there
    isn't much difference in compressed size (the exact threshold can
    be changed).

    Signed-off-by: Richard Purdie
    Signed-off-by: David Woodhouse

    Richard Purdie
     
  • Add LZO1X compression/decompression support to jffs2.

    LZO's interface doesn't entirely match that required by jffs2 so a
    buffer and memcpy is unavoidable.

    Signed-off-by: Richard Purdie
    Signed-off-by: David Woodhouse

    Richard Purdie
     
  • We've seen some evil corruption issues, where the corruption seems to be
    introduced after the JFFS2 crc32 is calculated but before the NAND
    controller calculates the ECC. So it's in RAM or in the PCI DMA
    transfer; not on the flash. Attempt to catch it earlier by (optionally)
    reading back from the flash immediately after writing it.

    Signed-off-by: David Woodhouse

    David Woodhouse
     

10 Jul, 2007

1 commit


11 Jun, 2007

1 commit


09 May, 2007

2 commits

  • The text removed by the following patch refers to functionality that never
    worked, to non-existing documentation file, and to mount options marked as
    obsolete in the module.

    Signed-off-by: Alexander E. Patrakov
    Signed-off-by: Adrian Bunk

    Alexander E. Patrakov
     
  • REISER_FS /proc option needs to depend on PROC_FS.

    fs/reiserfs/procfs.c: In function 'show_super':
    fs/reiserfs/procfs.c:134: error: 'reiserfs_proc_info_data_t' has no member named 'max_hash_collisions'
    fs/reiserfs/procfs.c:134: error: 'reiserfs_proc_info_data_t' has no member named 'breads'
    fs/reiserfs/procfs.c:135: error: 'reiserfs_proc_info_data_t' has no member named 'bread_miss'
    fs/reiserfs/procfs.c:135: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key'
    fs/reiserfs/procfs.c:136: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key_fs_changed'
    fs/reiserfs/procfs.c:136: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key_restarted'
    fs/reiserfs/procfs.c:137: error: 'reiserfs_proc_info_data_t' has no member named 'insert_item_restarted'
    fs/reiserfs/procfs.c:137: error: 'reiserfs_proc_info_data_t' has no member named 'paste_into_item_restarted'
    fs/reiserfs/procfs.c:138: error: 'reiserfs_proc_info_data_t' has no member named 'cut_from_item_restarted'
    fs/reiserfs/procfs.c:139: error: 'reiserfs_proc_info_data_t' has no member named 'delete_solid_item_restarted'
    fs/reiserfs/procfs.c:139: error: 'reiserfs_proc_info_data_t' has no member named 'delete_item_restarted'
    fs/reiserfs/procfs.c:140: error: 'reiserfs_proc_info_data_t' has no member named 'leaked_oid'
    fs/reiserfs/procfs.c:140: error: 'reiserfs_proc_info_data_t' has no member named 'leaves_removable'
    fs/reiserfs/procfs.c: In function 'show_per_level':
    fs/reiserfs/procfs.c:184: error: 'reiserfs_proc_info_data_t' has no member named 'balance_at'
    fs/reiserfs/procfs.c:185: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_read_at'
    fs/reiserfs/procfs.c:186: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_fs_changed'
    fs/reiserfs/procfs.c:187: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_restarted'
    fs/reiserfs/procfs.c:188: error: 'reiserfs_proc_info_data_t' has no member named 'free_at'
    fs/reiserfs/procfs.c:189: error: 'reiserfs_proc_info_data_t' has no member named 'items_at'
    fs/reiserfs/procfs.c:190: error: 'reiserfs_proc_info_data_t' has no member named 'can_node_be_removed'
    fs/reiserfs/procfs.c:191: error: 'reiserfs_proc_info_data_t' has no member named 'lnum'
    fs/reiserfs/procfs.c:192: error: 'reiserfs_proc_info_data_t' has no member named 'rnum'
    fs/reiserfs/procfs.c:193: error: 'reiserfs_proc_info_data_t' has no member named 'lbytes'
    fs/reiserfs/procfs.c:194: error: 'reiserfs_proc_info_data_t' has no member named 'rbytes'
    fs/reiserfs/procfs.c:195: error: 'reiserfs_proc_info_data_t' has no member named 'get_neighbors'
    fs/reiserfs/procfs.c:196: error: 'reiserfs_proc_info_data_t' has no member named 'get_neighbors_restart'
    fs/reiserfs/procfs.c:197: error: 'reiserfs_proc_info_data_t' has no member named 'need_l_neighbor'
    fs/reiserfs/procfs.c:197: error: 'reiserfs_proc_info_data_t' has no member named 'need_r_neighbor'
    fs/reiserfs/procfs.c: In function 'show_bitmap':
    fs/reiserfs/procfs.c:224: error: 'reiserfs_proc_info_data_t' has no member named 'free_block'
    fs/reiserfs/procfs.c:225: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:226: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:227: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:228: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:229: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:230: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c:230: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
    fs/reiserfs/procfs.c: In function 'show_journal':
    fs/reiserfs/procfs.c:384: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:385: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:386: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:387: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:388: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:389: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:390: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:391: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:392: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:393: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:394: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
    fs/reiserfs/procfs.c: In function 'reiserfs_proc_info_init':
    fs/reiserfs/procfs.c:504: warning: implicit declaration of function '__PINFO'
    fs/reiserfs/procfs.c:504: error: request for member 'lock' in something not a structure or union
    fs/reiserfs/procfs.c: In function 'reiserfs_proc_info_done':
    fs/reiserfs/procfs.c:544: error: request for member 'lock' in something not a structure or union
    fs/reiserfs/procfs.c:545: error: request for member 'exiting' in something not a structure or union
    fs/reiserfs/procfs.c:546: error: request for member 'lock' in something not a structure or union

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

05 May, 2007

1 commit

  • * git://git.linux-nfs.org/pub/linux/nfs-2.6: (28 commits)
    NFS: Fix a compile glitch on 64-bit systems
    NFS: Clean up nfs_create_request comments
    spkm3: initialize hash
    spkm3: remove bad kfree, unnecessary export
    spkm3: fix spkm3's use of hmac
    NFS4: invalidate cached acl on setacl
    NFS: Fix directory caching problem - with test case and patch.
    NFS: Set meaningful value for fattr->time_start in readdirplus results.
    NFS: Added support to turn off the NFSv3 READDIRPLUS RPC.
    SUNRPC: RPC client should retry with different versions of rpcbind
    SUNRPC: remove old portmapper
    NFS: switch NFSROOT to use new rpcbind client
    SUNRPC: switch the RPC server to use the new rpcbind registration API
    SUNRPC: switch socket-based RPC transports to use rpcbind
    SUNRPC: introduce rpcbind: replacement for in-kernel portmapper
    SUNRPC: Eliminate side effects from rpc_malloc
    SUNRPC: RPC buffer size estimates are too large
    NLM: Shrink the maximum request size of NLM4 requests
    NFS: Use pgoff_t in structures and functions that pass page cache offsets
    NFS: Clean up nfs_sync_mapping_wait()
    ...

    Linus Torvalds
     

03 May, 2007

1 commit

  • Make miscellaneous fixes to AFS and AF_RXRPC:

    (*) Make AF_RXRPC select KEYS rather than RXKAD or AFS_FS in Kconfig.

    (*) Don't use FS_BINARY_MOUNTDATA.

    (*) Remove a done 'TODO' item in a comemnt on afs_get_sb().

    (*) Don't pass a void * as the page pointer argument of kmap_atomic() as this
    breaks on m68k. Patch from Geert Uytterhoeven .

    (*) Use match_*() functions rather than doing my own parsing.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

01 May, 2007

1 commit


28 Apr, 2007

1 commit

  • Fixes for various arch compilation problems:

    (*) Missing module exports.

    (*) Variable name collision when rxkad and af_rxrpc both built in
    (rxrpc_debug).

    (*) Large constant representation problem (AFS_UUID_TO_UNIX_TIME).

    (*) Configuration dependencies.

    (*) printk() format warnings.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

27 Apr, 2007

2 commits


18 Feb, 2007

1 commit


14 Feb, 2007

2 commits


13 Feb, 2007

1 commit

  • This is the transport code for public key functionality in eCryptfs. It
    manages encryption/decryption request queues with a transport mechanism.
    Currently, netlink is the only implemented transport.

    Each inode has a unique File Encryption Key (FEK). Under passphrase, a File
    Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase
    combo on mount. This FEKEK encrypts each FEK and writes it into the header of
    each file using the packet format specified in RFC 2440. This is all
    symmetric key encryption, so it can all be done via the kernel crypto API.

    These new patches introduce public key encryption of the FEK. There is no
    asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes
    the FEK encryption and decryption out to a userspace daemon. After
    considering our requirements and determining the complexity of using various
    transport mechanisms, we settled on netlink for this communication.

    eCryptfs stores authentication tokens into the kernel keyring. These tokens
    correlate with individual keys. For passphrase mode of operation, the
    authentication token contains the symmetric FEKEK. For public key, the
    authentication token contains a PKI type and an opaque data blob managed by
    individual PKI modules in userspace.

    Each user who opens a file under an eCryptfs partition mounted in public key
    mode must be running a daemon. That daemon has the user's credentials and has
    access to all of the keys to which the user should have access. The daemon,
    when started, initializes the pluggable PKI modules available on the system
    and registers itself with the eCryptfs kernel module. Userspace utilities
    register public key authentication tokens into the user session keyring.
    These authentication tokens correlate key signatures with PKI modules and PKI
    blobs. The PKI blobs contain PKI-specific information necessary for the PKI
    module to carry out asymmetric key encryption and decryption.

    When the eCryptfs module parses the header of an existing file and finds a Tag
    1 (Public Key) packet (see RFC 2440), it reads in the public key identifier
    (signature). The asymmetrically encrypted FEK is in the Tag 1 packet;
    eCryptfs puts together a decrypt request packet containing the signature and
    the encrypted FEK, then it passes it to the daemon registered for the
    current->euid via a netlink unicast to the PID of the daemon, which was
    registered at the time the daemon was started by the user.

    The daemon actually just makes calls to libecryptfs, which implements request
    packet parsing and manages PKI modules. libecryptfs grabs the public key
    authentication token for the given signature from the user session keyring.
    This auth tok tells libecryptfs which PKI module should receive the request.
    libecryptfs then makes a decrypt() call to the PKI module, and it passes along
    the PKI block from the auth tok. The PKI uses the blob to figure out how it
    should decrypt the data passed to it; it performs the decryption and passes
    the decrypted data back to libecryptfs. libecryptfs then puts together a
    reply packet with the decrypted FEK and passes that back to the eCryptfs
    module.

    The eCryptfs module manages these request callouts to userspace code via
    message context structs. The module maintains an array of message context
    structs and places the elements of the array on two lists: a free and an
    allocated list. When eCryptfs wants to make a request, it moves a msg ctx
    from the free list to the allocated list, sets its state to pending, and fires
    off the message to the user's registered daemon.

    When eCryptfs receives a netlink message (via the callback), it correlates the
    msg ctx struct in the alloc list with the data in the message itself. The
    msg->index contains the offset of the array of msg ctx structs. It verifies
    that the registered daemon PID is the same as the PID of the process that sent
    the message. It also validates a sequence number between the received packet
    and the msg ctx. Then, it copies the contents of the message (the reply
    packet) into the msg ctx struct, sets the state in the msg ctx to done, and
    wakes up the process that was sleeping while waiting for the reply.

    The sleeping process was whatever was performing the sys_open(). This process
    originally called ecryptfs_send_message(); it is now in
    ecryptfs_wait_for_response(). When it wakes up and sees that the msg ctx
    state was set to done, it returns a pointer to the message contents (the reply
    packet) and returns. If all went well, this packet contains the decrypted
    FEK, which is then copied into the crypt_stat struct, and life continues as
    normal.

    The case for creation of a new file is very similar, only instead of a decrypt
    request, eCryptfs sends out an encrypt request.

    > - We have a great clod of key mangement code in-kernel. Why is that
    > not suitable (or growable) for public key management?

    eCryptfs uses Howells' keyring to store persistent key data and PKI state
    information. It defers public key cryptographic transformations to userspace
    code. The userspace data manipulation request really is orthogonal to key
    management in and of itself. What eCryptfs basically needs is a secure way to
    communicate with a particular daemon for a particular task doing a syscall,
    based on the UID. Nothing running under another UID should be able to access
    that channel of communication.

    > - Is it appropriate that new infrastructure for public key
    > management be private to a particular fs?

    The messaging.c file contains a lot of code that, perhaps, could be extracted
    into a separate kernel service. In essence, this would be a sort of
    request/reply mechanism that would involve a userspace daemon. I am not aware
    of anything that does quite what eCryptfs does, so I was not aware of any
    existing tools to do just what we wanted.

    > What happens if one of these daemons exits without sending a quit
    > message?

    There is a stale uidpid association in the hash table for that user. When
    the user registers a new daemon, eCryptfs cleans up the old association and
    generates a new one. See ecryptfs_process_helo().

    > - _why_ does it use netlink?

    Netlink provides the transport mechanism that would minimize the complexity of
    the implementation, given that we can have multiple daemons (one per user). I
    explored the possibility of using relayfs, but that would involve having to
    introduce control channels and a protocol for creating and tearing down
    channels for the daemons. We do not have to worry about any of that with
    netlink.

    Signed-off-by: Michael Halcrow
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow