draft-jiang-dhc-dhcp-privacy.xml

<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no"?>
<?rfc strict="yes"?>
<?rfc compact="yes"?>
<rfc category="info" docName="draft-ietf-dhc-dhcp-privacy-04"
     ipr="trust200902">
  <front>
    <title abbrev="DHCP Privacy considerations">Privacy considerations for
    DHCP</title>

    <author fullname="Sheng Jiang" initials="S." surname="Jiang">
      <organization>Huawei Technologies Co., Ltd</organization>

      <address>
        <postal>
          <street>Q14, Huawei Campus, No.156 Beiqing Road</street>

          <city>Hai-Dian District, Beijing, 100095</city>

          <country>P.R. China</country>
        </postal>

        <email>jiangsheng@huawei.com</email>
      </address>
    </author>

    <author fullname="Suresh Krishnan" initials="S." surname="Krishnan">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>8400 Decarie Blvd.</street>

          <city>Town of Mount Royal</city>

          <region>QC</region>

          <country>Canada</country>
        </postal>

        <phone>+1 514 345 7900 x42871</phone>

        <email>suresh.krishnan@ericsson.com</email>
      </address>
    </author>

    <author fullname="Tomek Mrugalski" initials="T." surname="Mrugalski">
      <organization abbrev="ISC">Internet Systems Consortium,
      Inc.</organization>

      <address>
        <postal>
          <street>950 Charter Street</street>

          <city>Redwood City</city>

          <region>CA</region>

          <code>94063</code>

          <country>USA</country>
        </postal>

        <email>tomasz.mrugalski@gmail.com</email>
      </address>
    </author>

    <date />

    <area>Internet</area>

    <workgroup>dhc</workgroup>

    <keyword>DHCP Privacy</keyword>

    <abstract>
      <t>DHCP is a protocol that is used to provide addressing and
      configuration information to IPv4 hosts. This document discusses the
      various identifiers used by DHCP and the potential privacy issues.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t>Dynamic Host Configuration Protocol (DHCP) <xref
      target="RFC2131"></xref> is a protocol that is used to provide
      addressing and configuration information to IPv4 hosts. DHCP uses
      several identifiers that could become a source for gleaning information
      about the IPv4 host. This information may include device type, operating
      system information, location(s) that the device may have previously
      visited, etc. This document discusses the various identifiers used by
      DHCP and the potential privacy issues <xref target="RFC6973"></xref>. In
      particular, it also takes into consideration the problem of pervasive
      monitoring <xref target="RFC7258"></xref>.</t>

      <t>Future works may propose protocol changes to fix the privacy issues
      that have been analyzed in this document. These changes are out of scope
      for this document.</t>

      <t>The primary focus of this document is around privacy considerations
      for clients to support client mobility and connection to random
      networks. The privacy of DHCP servers and relay agents are considered
      less important as they are typically open for public services. And, it
      is generally assumed that relay agent to server communication is
      protected from casual snooping, as that communication occurs in the
      provider's backbone. Nevertheless, the topics involving relay agents and
      servers are explored to some degree. However, future work may want to
      further explore privacy of DHCP servers and relay agents.</t>
    </section>

    <section title="Requirements Language and Terminology">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"></xref>. When these words are not in ALL CAPS (such as
      "should" or "Should"), they have their usual English meanings, and are
      not to be interpreted as <xref target="RFC2119"></xref> key words.</t>

      <t>In addition the following terminology is used: <list hangIndent="8"
          style="hanging">
          <t hangText="Stable identifier">- Any property disclosed by a DHCP
          client that does not change over time or changes very infrequently
          and is unique for said client in a given context. Examples include
          MAC address, client-id, and a hostname. Some identifiers may be
          considered stable only under certain conditions, for example one
          client implementation may keep its client-id stored in stable
          storage while another may generate it on the fly and use a different
          one after each boot. Stable identifiers may or may not be globally
          unique.</t>
        </list></t>
    </section>

    <section title="DHCP Options Carrying Identifiers">
      <t>In DHCP, there are a few options that contain identification
      information or that can be used to extract identification information
      about the client. This section enumerates various options and the
      identifiers conveyed in them, which can be used to disclose client
      identification. They are targets of various attacks that are analyzed in
      <xref target="attacks"></xref>.</t>

      <section title="Client Identifier Option">
        <t>The Client Identifier Option <xref target="RFC2131"></xref> is used
        to pass an explicit client identifier to a DHCP server.</t>

        <t>The client identifier is an opaque key, which must be unique to
        that client within the subnet to which the client is attached. It
        typically remains stable after it has been initially generated. It may
        contain a hardware address, identical to the contents of the 'chaddr'
        field, or another type of identifier, such as a DNS name. <xref
        target="RFC3315"></xref> in Section 9.2 specifies DUID-LLT (Link-layer
        + time) as the recommended DUID (DHCP Unique Identifier) type. <xref
        target="RFC4361"></xref>, Section 6.1 introduces this concept to DHCP.
        Those two documents recommend that client identifiers be generated by
        using the permanent link-layer address of the network interface that
        the client is trying to configure. <xref target="RFC4361"></xref>
        updates the recommendation of Client Identifiers to be "consists of a
        type field whose value is normally 255, followed by a four-byte IA_ID
        field, followed by the DUID for the client as defined in RFC 3315,
        section 9". This does not change the lifecycle of the Client
        Identifiers. Clients are expected to generate their Client Identifiers
        once (during first operation) and store it in non-volatile storage or
        use the same deterministic algorithm to generate the same Client
        Identifier values again.</t>

        <t>This means that most implementations will use the available
        link-layer address during its first boot. Even if the administrator
        enables link-layer address randomization, it is likely that it was not
        yet enabled during the first device boot. Hence the original,
        unobfuscated link-layer address will likely end up being announced as
        the client identifier, even if the link-layer address has changed (or
        even if it is being changed on a periodic basis). The exposure of the
        original link-layer address in the client identifier will also
        undermine other privacy extensions such as <xref
        target="RFC4941"></xref>.</t>
      </section>

      <section title="Address Fields &amp; Options">
        <t>The 'yiaddr' field <xref target="RFC2131"></xref> in DHCP message
        is used to convey an allocated address from the server to the
        client.</t>

        <t>The DHCP specification <xref target="RFC2131"></xref> provides a
        way to specify the client link-layer address in the DHCP message
        header. A DHCP message header has 'htype' and 'chaddr' fields to
        specify the client link-layer address type and the link-layer address,
        respectively. The 'chaddr' field is used both as a hardware address
        for transmission of reply messages and as a client identifier.</t>

        <t>The 'requested IP address' option <xref target="RFC2131"></xref> is
        used by a client to suggest that a particular IP address be
        assigned.</t>
      </section>

      <section title="Client FQDN Option">
        <t>The Client Fully Qualified Domain Name (FQDN) option <xref
        target="RFC4702"></xref> is used by DHCP clients and servers to
        exchange information about the client's fully qualified domain name
        and about who has the responsibility for updating the DNS with the
        associated A and PTR RRs.</t>

        <t>A client can use this option to convey all or part of its domain
        name to a DHCP server for the IP-address-to-FQDN mapping. In most case
        a client sends its hostname as a hint for the server. The DHCP server
        MAY be configured to modify the supplied name or to substitute a
        different name. The server should send its notion of the complete FQDN
        for the client in the Domain Name field.</t>
      </section>

      <section title="Parameter Request List Option">
        <t>The Parameter Request List option <xref target="RFC2131"></xref> is
        used to inform the server about options the client wants the server to
        send to the client. The content of a Parameter Request List option are
        the option codes for options requested by the client.</t>
      </section>

      <section title="Vendor Class and Vendor-Identifying Vendor Class Options">
        <t>The Vendor Class option <xref target="RFC2131"></xref>, the
        Vendor-Identifying Vendor Class option, and the Vendor-Identifying
        Vendor Information option <xref target="RFC3925"></xref> are used by
        the DHCP client to identify the vendor that manufactured the hardware
        on which the client is running.</t>

        <t>The information contained in the data area of this option is
        contained in one or more opaque fields that identify the details of
        the hardware configuration of the host on which the client is running,
        or of industry consortium compliance, for example, the version of the
        operating system the client is running or the amount of memory
        installed on the client.</t>
      </section>

      <section title="Civic Location Option">
        <t>DHCP servers use the Civic Location Option <xref
        target="RFC4776"></xref> to deliver location information (the civic
        and postal addresses) to DHCP clients. It may refer to three
        locations: the location of the DHCP server, the location of the
        network element believed to be closest to the client, or the location
        of the client, identified by the "what" element within the option.</t>
      </section>

      <section title="Coordinate-Based Location Option">
        <t>The GeoConf and GeoLoc options <xref target="RFC6225"></xref> are
        used by a DHCP server to provide coordinate-based geographic location
        information to DHCP clients. They enable a DHCP client to obtain its
        geographic location.</t>
      </section>

      <section title="Client System Architecture Type Option">
        <t>The Client System Architecture Type Option <xref
        target="RFC4578"></xref> is used by a DHCP client to send a list of
        supported architecture types to the DHCP server. It is used by clients
        that must be booted using the network rather than from local storage,
        so the server can decide which boot file should be provided to the
        client.</t>

        <t><!--Client Network Interface Identifier Option and Client Machine 
        Identifier Option seems not in use. Together with the Client System 
        Architecture Type Option, they are used for the Intel Preboot eXecution 
        Environment (PXE). Client Network Interface Identifier Option provide 
        information about the level of UNDI support.--></t>
      </section>

      <section title="Relay Agent Information Option and Sub-options">
        <t>A DHCP relay agent includes a Relay Agent Information option<xref
        target="RFC3046"></xref> to identify the remote host end of the
        circuit. It contains a "circuit ID" sub-option for the incoming
        circuit, which is an agent-local identifier of the circuit from which
        a DHCP client-to-server packet was received, and a "remote ID"
        sub-option which provides a trusted identifier for the remote
        high-speed modem.</t>

        <t>Possible encoding of "circuit ID" sub-option includes: router
        interface number, switching hub port number, remote access server port
        number, frame relay DLCI, ATM virtual circuit number, cable data
        virtual circuit number, etc.</t>

        <t>Possible encoding of the "remote ID" sub-option includes: a "caller
        ID" telephone number for dial-up connection, a "user name" prompted
        for by a remote access server, a remote caller ATM address, a "modem
        ID" of a cable data modem, the remote IP address of a point-to-point
        link, a remote X.25 address for X.25 connections, etc.</t>

        <t>The link-selection sub-option <xref target="RFC3527"></xref> is
        used by any DHCP relay agent that desires to specify a subnet/link for
        a DHCP client request that it is relaying but needs the subnet/link
        specification to be different from the IP address the DHCP server
        should use when communicating with the relay agent. It contains an IP
        address, which can identify the client's subnet/link. Also, assuming
        network topology knowledge, it also reveals client location.</t>

        <t>A DHCP relay includes a Subscriber-ID option <xref
        target="RFC3993"></xref> to associate some provider-specific
        information with clients' DHCP messages that is independent of the
        physical network configuration through which the subscriber is
        connected. The "subscriber-id" assigned by the provider is intended to
        be stable as customers connect through different paths, and as network
        changes occur. The Subscriber-ID is an ASCII string, which is assigned
        and configured by the network provider.</t>
      </section>
    </section>

    <section title="Existing Mechanisms That Affect Privacy">
      <t>This section describes deployed DHCP mechanisms that affect
      privacy.</t>

      <section title="DNS Updates">
        <t>The Client FQDN (Fully Qualified Domain Name) Option <xref
        target="RFC4702"></xref> used along with DNS Updates <xref
        target="RFC2136"></xref> defines a mechanism that allows both clients
        and server to insert into the DNS domain information about clients.
        Both forward (A) and reverse (PTR) resource records can be updated.
        This allows other nodes to conveniently refer to a host, despite the
        fact that its IP address may be changing.</t>

        <t>This mechanism exposes two important pieces of information: current
        address (which can be mapped to current location) and client's
        hostname. The stable hostname can then be used to correlate the client
        across different network attachments even when its IP addresses keep
        changing.</t>
      </section>

      <section title="Allocation strategies">
        <t>A DHCP server running in typical, stateful mode is given a task of
        managing one or more pools of IP address. When a client requests an
        address, the server must pick an address out of a configured pool.
        Depending on the server's implementation, various allocation
        strategies are possible. Choices in this regard may have privacy
        implications. Note that the constraints in DHCP and DHCPv6 are
        radically different, but servers that allow allocation strategy
        configuration may allow configuring them in both DHCP and DHCPv6. Not
        every allocation strategy is equally suitable for DHCP and for
        DHCPv6.</t>

        <t>Iterative allocation - a server may choose to allocate addresses
        one by one. That strategy has the benefit of being very fast, thus
        being favored in deployments that prefer performance. However, it
        makes the allocated addresses very predictable. Also, since the
        addresses allocated tend to be clustered at the beginning of an
        available pool, it makes scanning attacks much easier.</t>

        <t>Identifier-based allocation - some server implementations may
        choose to allocate an address that is based on one of the available
        identifiers, e.g., client identifier or MAC address. It is also
        convenient, as a returning client is very likely to get the same
        address. Those properties are convenient for system administrators, so
        DHCP server implementors are often requested to implement it. The
        downside of such allocation is that the client has a very stable IP
        address. That means that correlation of activities over time, location
        tracking, address scanning and OS/vendor discovery apply. This is
        certainly an issue in DHCPv6, but due to a much smaller address space
        is almost never a problem in DHCP.</t>

        <t>Hash allocation - it's an extension of identifier-based allocation.
        Instead of using the identifier directly, it is hashed first. If the
        hash is implemented correctly, it removes the flaw of disclosing the
        identifier, a property that eliminates susceptibility to address
        scanning and OS/vendor discovery. If the hash is poorly implemented
        (e.g., it can be reversed), it introduces no improvement over
        identifier-based allocation.</t>

        <t>Random allocation - a server can pick a resource randomly out of an
        available pool. This allocation scheme essentially prevents returning
        clients from getting the same address again. On the other hand, it is
        beneficial from a privacy perspective as addresses generated that way
        are not susceptible to correlation attacks, OS/vendor discovery
        attacks, or identity discovery attacks. Note that even though the
        address itself may be resilient to a given attack, the client may
        still be susceptible if additional information is disclosed other way,
        e.g., the client's address may be randomized, but it still can leak
        its MAC address in the client-id option.</t>

        <t>Other allocation strategies may be implemented.</t>

        <t>Given the limited size of most IPv4 public address pools,
        allocation mechanisms in IPv4 may not provide much privacy protection
        or leak much useful information, if misused.</t>
      </section>
    </section>

    <section anchor="attacks" title="Attacks">
      <section title="Device type discovery">
        <t>The type of device used by the client can be guessed by the
        attacker using the Vendor Class Option, the 'chaddr' field, and by
        parsing the Client ID Option. All of those options may contain an
        Organizationally Unique Identifier (OUI) that represents the device's
        vendor. That knowledge can be used for device-specific vulnerability
        exploitation attacks.</t>
      </section>

      <section title="Operating system discovery">
        <t>The operating system running on a client can be guessed using the
        Vendor Class option, the Client System Architecture Type option, or by
        using fingerprinting techniques on the combination of options
        requested using the Parameter Request List option.</t>
      </section>

      <section title="Finding location information">
        <t>The location information can be obtained by the attacker by many
        means. The most direct way to obtain this information is by looking
        into a message originating from the server that contains the Civic
        Location, GeoConf, or GeoLoc options. It can also be indirectly
        inferred using the Relay Agent Information option, with the remote ID
        sub-option, the circuit ID option (e.g., if an access circuit on an
        Access Node corresponds to a civic location), or the Subscriber ID
        Option (if the attacker has access to subscriber info).</t>
      </section>

      <section title="Finding previously visited networks">
        <t>When DHCP clients connect to a network, they attempt to obtain the
        same address they had used before they attached to the network. They
        do this by putting the previously assigned address in the requested IP
        address option. By observing these addresses, an attacker can identify
        the network the client had previously visited.</t>
      </section>

      <section title="Finding a stable identity">
        <t>An attacker might use a stable identity gleaned from DHCP messages
        to correlate activities of a given client on unrelated networks. The
        Client FQDN option, the Subscriber ID option, and the Client ID option
        can serve as long-lived identifiers of DHCP clients. The Client FQDN
        option can also provide an identity that can easily be correlated with
        web server activity logs.</t>
      </section>

      <section title="Pervasive monitoring">
        <t>This is an enhancement, or a combination of most of the
        aforementioned mechanisms. An operator who controls a non-trivial
        number of access points or network segments, may use obtained
        information about a single client and observe the client's habits.
        Although users may not expect true privacy from their operators, the
        information that is set up to be monitored by users' service operators
        may also be gathered by an adversary who monitors a wide range of
        networks and develops correlations from that information.</t>
      </section>

      <section title="Finding client's IP address or hostname">
        <t>Many DHCP deployments use DNS Updates <xref
        target="RFC4702"></xref> that put a client's information (current IP
        address, client's hostname) into the DNS, where it is easily
        accessible by anyone interested. Client ID is also disclosed, albeit
        in not easily accessible form (SHA-256 digest of the client-id).
        Although SHA-256 is considered irreversible, DHCP client ID can't be
        converted back to client-id. However, SHA-256 digest can be used as an
        unique identifier that is accessible by any host.</t>
      </section>

      <section title="Correlation of activities over time">
        <t>As with other identifiers, an IP address can be used to correlate
        the activities of a host for at least as long as the lifetime of the
        address. If that address was generated from some other, stable
        identifier and that generation scheme can be deduced by an attacker,
        the duration of the correlation attack extends to that of the
        identifier. In many cases, its lifetime is equal to the lifetime of
        the device itself.</t>
      </section>

      <section title="Location tracking">
        <t>If a stable identifier is used for assigning an address and such
        mapping is discovered by an attacker, it can be used for tracking a
        user. In particular both passive (a service that the client connects
        to can log the client's address and draw conclusions regarding its
        location and movement patterns based on the addresses it is connecting
        from) and active (an attacker can send ICMP echo requests or other
        probe packets to networks of suspected client locations) methods can
        be used. To give specific example, by accessing a social portal from
        tomek-laptop.coffee.somecity.com.example,
        tomek-laptop.mycompany.com.example and tomek-laptop.myisp.example.com,
        the portal administrator can draw conclusions about tomek-laptop's
        owner's current location and his habits.</t>
      </section>

      <section title="Leasequery &amp; bulk leasequery">
        <t>Attackers may pretend to be an access concentrator, either as a
        DHCP relay agent or as a DHCP client, to obtain location information
        directly from the DHCP server(s) using the DHCP leasequery <xref
        target="RFC4388"></xref> mechanism.</t>

        <t>Location information is information needed by the access
        concentrator to forward traffic to a broadband-accessible host. This
        information includes knowledge of the host hardware address, the port
        or virtual circuit that leads to the host, and/or the hardware address
        of the intervening subscriber modem.</t>

        <t>Furthermore, the attackers may use the DHCP bulk leasequery <xref
        target="RFC6926"></xref> mechanism to obtain bulk information about
        DHCP bindings, even without knowing the target bindings.</t>

        <t>Additionally, active leasequery <xref target="RFC7724"></xref> is a
        mechanism for subscribing to DHCP lease update changes in near
        real-time. The intent of this mechanism is to update an operator's
        database, but if misused, an attacker could defeat the server's
        authentication mechanisms and subscribe to all updates. He then could
        continue receiving updates, without any need for local presence.</t>
      </section>
    </section>

    <section anchor="security" title="Security Considerations">
      <t>In current practice, the client privacy and client authentication are
      mutually exclusive. The client authentication procedure reveals
      additional client information in their certificates/identifiers. Full
      privacy for the clients may mean the clients are also anonymous to the
      server and the network.</t>
    </section>

    <section anchor="privacy-consider" title="Privacy Considerations">
      <t>This document in its entirety discusses privacy considerations in
      DHCP. As such, no dedicated discussion is needed.</t>
    </section>

    <section title="IANA Considerations">
      <t>This draft does not request any IANA action.</t>
    </section>

    <section anchor="acks" title="Acknowledgements">
      <t>The authors would like to thank the valuable comments made by Stephen
      Farrell, Ted Lemon, Ines Robles, Russ White, Christian Huitema, Bernie
      Volz, Jinmei Tatuya, Marcin Siodelski, Christian Schaefer, and other
      members of DHC WG.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.2119'?>

      <?rfc include='reference.RFC.2131'?>

      <?rfc include='reference.RFC.2136'?>

      <?rfc include='reference.RFC.6973'?>

      <?rfc include='reference.RFC.7258'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.3046'?>

      <?rfc include='reference.RFC.3315'?>

      <?rfc include='reference.RFC.3527'?>

      <?rfc include='reference.RFC.3925'?>

      <?rfc include='reference.RFC.3993'?>

      <?rfc include='reference.RFC.4361'?>

      <?rfc include='reference.RFC.4388'?>

      <?rfc include='reference.RFC.4578'?>

      <?rfc include='reference.RFC.4702'?>

      <?rfc include='reference.RFC.4776'?>

      <?rfc include='reference.RFC.4941'?>

      <?rfc include='reference.RFC.6225'?>

      <?rfc include='reference.RFC.6926'?>

      <?rfc include='reference.RFC.7724'?>
    </references>
  </back>
</rfc>