| User Opinions |
100%
0%
(3 votes)
|
|
Thank you for rating this answer.
|
Distributed hash tables (DHTs) are a class of decentralized distributed systems that partition ownership of a set of keys among participating nodes, and can efficiently route messages to the unique owner of any given key. Each node is analogous to an array slot in a hash table. DHTs are typically designed to scale to large numbers of nodes and to handle continual node arrivals and failures. This infrastructure can be used to build more complex services, such as distributed file systems, peer-to-peer file sharing systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging.
Background
DHT research was originally motivated, in part, by peer-to-peer systems such as Napster, Gnutella, and Freenet, which took advantage of resources distributed across the Internet to provide a single useful application. In particular, they took advantage of increased bandwidth and hard disk capacity to provide a file sharing service.
These systems differed in how they found the data their peers contained. Napster had a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the querier to the nodes that held the results. This central component left the system vulnerable to attacks and lawsuits. Gnutella and similar networks moved to a flooding query model — in essence, each search would result in a message being broadcast to every other machine in the network. While avoiding a single point of failure, this method was significantly less efficient than Napster. Finally, Freenet was also fully distributed, but employed a heuristic key based routing in which each file was associated with a key, and files with similar keys tended to cluster on a similar set of nodes. Queries were likely to be routed through the network to such a cluster without needing to visit many peers. However, Freenet did not guarantee that data would be found.
Distributed hash tables use a more structured key based routing in order to attain both the decentralization of Gnutella and Freenet, and the efficiency and guaranteed results of Napster. One drawback is that, like Freenet, DHTs only directly support exact-match search, rather than keyword search, although that functionality can be layered on top of a DHT.
The first four DHTs—CAN, Chord, Pastry, and Tapestry—were introduced about the same time in 2001. Since then this area of research has been quite active. Outside academia, DHT technology has been adopted as a component of BitTorrent and in the Coral Content Distribution Network
Properties
DHTs characteristically emphasize the following properties:
Decentralisation: the nodes collectively form the system without any central coordination. Scalability: the system should function efficiently even with thousands or millions of nodes. Fault tolerance: the system should be reliable (in some sense) even with nodes continuously joining, leaving, and failing.
A key technique used to achieve these goals is that any one node needs to coordinate with only a few other nodes in the system -- most commonly, Θ(logn) of the n participants (see below) -- so that only a limited amount of work needs to be done for each change in membership.
Some DHT designs seek to be secure against malicious participants and to allow participants to remain anonymous, though this is less common than in many other peer-to-peer (especially file sharing) systems; see anonymous P2P.
Finally, DHTs must deal with more traditional distributed systems issues such as load balance, data integrity, and performance (in particular, ensuring that operations such as routing and data storage or retrieval complete quickly).
Structure
A DHT is built around an abstract keyspace, such as the set of 160-bit strings. Ownership of the keyspace is split among the participating nodes according to a keyspace partitioning scheme. The overlay network connects the nodes, allowing them to find the owner of any given key in the keyspace. (This design decomposition has been suggested in (Naor and Wieder, 2003) and (Manku, 2004).)
Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To store a file with given filename and data in the DHT, the SHA1 hash of filename is found, producing a 160-bit key k. Thereafter, a message put(k,data) may be sent to any node participating in the DHT. The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key k as specified by the keyspace partitioning, where the pair (k,data) is stored. Any other client can then retrieve the contents of the file by again hashing filename to produce k and asking any DHT node to find the data associated with k with a message get(k). The message will again be routed through the overlay to the node responsible for k, which will reply with the stored data.

This article is licensed under the GNU Free Documentation License (GFDL). It uses material from the Wikipedia article Distributed Hash Table. More on Wikipedia.
|