ABSTRACT
Peer-to-peer computing consists of an open-ended network of distributed computational peers, where each peer shares data and services with a set of other peers, called its acquaintances. The peer-to-peer paradigm was initially popularized by file-sharing systems such as Napster and Gnutella, but its basic ideas and principles have now found their way into more critical and complex data-sharing applications like those for electronic medical records and scientific data. In such environments, data sharing poses new challenges mainly due to the lack of centralized control, the transient nature of inter-peer connections, and the limited, ever-changing cooperation among the peers.
In the seminar we can present new solutions for data sharing and querying in a peer-to-peer data management system, that is, a peer-to-peer system where each peer manages its own database. The solutions are motivated by considering data sharing requirements of independent biological data sources. To support data sharing in such a setting, I propose the use of mapping tables containing pairs of corresponding data values that reside in different peers. I illustrate how automated tools can help manage the tables by checking their consistency and by inferring new tables from existing ones. To support structured querying, I propose a framework in which local user queries are translated, through mapping tables, to a set of queries over the acquainted peers. Finally, I present optimization techniques that enable an efficient rewriting even over large mapping tables. The proposed mechanisms have been implemented and evaluated experimentally and constitute the foundation of a prototype implementation of architecture for peer-to-peer data management.
The term “peer-to-peer” (P2P) refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner. With the pervasive deployment of computers, P2P is increasingly receiving attention in research, product development, and investment circles. Some of the benefits of a P2P approach include: improving scalability by avoiding dependency on centralized points; eliminating the need for costly infrastructure by enabling direct communication among clients; and enabling resource aggregation.
This survey reviews the field of P2P systems and applications by summarizing the key concepts and giving an overview of the most important systems. Design and implementation issues of P2P systems are analyzed in general, and then revisited for eight case studies. This survey will help people in the research community and industry understands the potential benefits of P2P. For people unfamiliar with the field it provides a general overview, as well as detailed case studies. Comparison of P2P solutions with alternative architectures is intended for users, developers, and system administrators (IT).
Introduction
Peer-to-Peer (P2P) computing is a very controversial topic. Many experts believe that there is not much new in P2P. There is a lot of confusion: what really constitutes P2P? For example, is distributed computing really P2P or not? We believe that P2P does warrant a thorough analysis. The goals of the paper are threefold: 1) to understand what P2P is and it is not, as well as what is new, 2) to offer a thorough analysis of and examples of P2P computing, and 3) to analyze the potential of P2P computing.
The term “peer-to-peer” refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner. The resources encompass computing power, data (storage and content), network bandwidth, and presence (computers, human, and other resources). The critical function can be distributed computing, data/content sharing, communication and collaboration, or platform services. Decentralization may apply to algorithms, data, and meta-data, or to all of them. This does not preclude retaining centralization in some parts of the systems and applications. Typical P2P systems reside on the edge of the Internet or in ad-hoc networks. P2P enables:
•Valuable externalities, by aggregating resources through low-cost interoperability, the whole is made greater than the sum of its parts
• lower cost of ownership and cost sharing, by using existing infrastructure and by eliminating or distributing the maintenance costs
• Anonymity/privacy, by incorporating these requirements in the design and algorithms of P2P systems and applications, and by allowing peers a greater degree of autonomous control over their data and resources
However, P2P also raises some security concerns for users and accountability concerns for IT. In general it is still a technology in development where it is hard to distinguish useful from hype and new from old. In the rest of the paper we evaluate these observations in general as well as for specific P2P systems and applications.
P2P gained visibility with Napster’s support for music sharing on the Web [Napster 2001] and its lawsuit with the music companies. However, it is increasingly becoming an important technique in various areas, such as distributed and collaborative computing both on the Web and in ad-hoc networks. P2P has received the attention of both industry and academia. Some big industrial efforts include the P2P Working Group, led by many industrial partners such as Intel, HP, Sony, and a number of startup companies; and JXTA, an open-source effort led by Sun. There are already a number of books published [Oram 2000, Barkai 2001, Miller 2001, Moore and Hebeler 2001, Fattah and Fattah 2002], and a number of theses and projects in progress at universities, such as Chord [Stoica et al 2001], OceanStore [Kubiatowicz et al.
2000], PAST [Druschel and Rowstron 2001], CAN [Ratnasamy 2001], and FreeNet [Clark 1999].
Here are several of the definitions of P2P that are being used by the P2P community. The Intel P2P working group defines P2P as “the sharing of computer resources and services by direct exchange between systems” [p2pwg 2001]. David Anderson calls SETI@home and similar P2P projects that do not involve communication as “inverted client-server”, emphasizing that the computers at the edge provide power and those in the middle of the network are there only to coordinate them [Anderson 2002]. Alex Weytsel of Aberdeen defines P2P as “the use of devices on the internet periphery in a non-client capacity” [Veytsel 2001]. Clay Shirky of O’Reilly and Associate uses the following definition: “P2P is a class of applications that takes advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, P2P nodes must operate outside the DNS system and have significant or total autonomy from central servers” [Shirky 2001]. Finally, Kindberg defines P2P systems as those with independent lifetimes [Kindberg 2002].
In our view, P2P is about sharing: giving to and obtaining from a peer community. A peer gives some resources and obtains other resources in return. In the case of Napster, it was about offering music to the rest of the community and getting other music in return. It could be donating resources for a good cause, such as searching for extraterrestrial life or combating cancer, where the benefit is obtaining the satisfaction of helping others. P2P is also a way of implementing systems based on the notion of increasing the decentralization of systems, applications, or simply algorithms. It is based on the principles that the world will be connected and widely distributed and that it will not be possible or desirable to leverage everything off of centralized, administratively managed infrastructures. P2P is a way to leverage vast amounts of computing power, storage, and connectivity from personal computers distributed around the world.
Assuming that “peer” is defined as “like each other,” a P2P system then is one in which autonomous peers depend on other autonomous peers. Peers are autonomous when they are not wholly controlled by each other or by the same authority, e.g., the same user. Peers depend on each other for getting information, computing resources, forwarding requests, etc. which are essential for the functioning of the system as a whole and for the benefit of all peers. As a result of the autonomy of peers, they cannot necessarily trust each other and rely completely on the behavior of other peers, so issues of scale and redundancy become much more important than in traditional centralized or distributed systems.
Download Full Project Report
Peer-to-peer computing consists of an open-ended network of distributed computational peers, where each peer shares data and services with a set of other peers, called its acquaintances. The peer-to-peer paradigm was initially popularized by file-sharing systems such as Napster and Gnutella, but its basic ideas and principles have now found their way into more critical and complex data-sharing applications like those for electronic medical records and scientific data. In such environments, data sharing poses new challenges mainly due to the lack of centralized control, the transient nature of inter-peer connections, and the limited, ever-changing cooperation among the peers.
In the seminar we can present new solutions for data sharing and querying in a peer-to-peer data management system, that is, a peer-to-peer system where each peer manages its own database. The solutions are motivated by considering data sharing requirements of independent biological data sources. To support data sharing in such a setting, I propose the use of mapping tables containing pairs of corresponding data values that reside in different peers. I illustrate how automated tools can help manage the tables by checking their consistency and by inferring new tables from existing ones. To support structured querying, I propose a framework in which local user queries are translated, through mapping tables, to a set of queries over the acquainted peers. Finally, I present optimization techniques that enable an efficient rewriting even over large mapping tables. The proposed mechanisms have been implemented and evaluated experimentally and constitute the foundation of a prototype implementation of architecture for peer-to-peer data management.
The term “peer-to-peer” (P2P) refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner. With the pervasive deployment of computers, P2P is increasingly receiving attention in research, product development, and investment circles. Some of the benefits of a P2P approach include: improving scalability by avoiding dependency on centralized points; eliminating the need for costly infrastructure by enabling direct communication among clients; and enabling resource aggregation.
This survey reviews the field of P2P systems and applications by summarizing the key concepts and giving an overview of the most important systems. Design and implementation issues of P2P systems are analyzed in general, and then revisited for eight case studies. This survey will help people in the research community and industry understands the potential benefits of P2P. For people unfamiliar with the field it provides a general overview, as well as detailed case studies. Comparison of P2P solutions with alternative architectures is intended for users, developers, and system administrators (IT).
Introduction
Peer-to-Peer (P2P) computing is a very controversial topic. Many experts believe that there is not much new in P2P. There is a lot of confusion: what really constitutes P2P? For example, is distributed computing really P2P or not? We believe that P2P does warrant a thorough analysis. The goals of the paper are threefold: 1) to understand what P2P is and it is not, as well as what is new, 2) to offer a thorough analysis of and examples of P2P computing, and 3) to analyze the potential of P2P computing.
The term “peer-to-peer” refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner. The resources encompass computing power, data (storage and content), network bandwidth, and presence (computers, human, and other resources). The critical function can be distributed computing, data/content sharing, communication and collaboration, or platform services. Decentralization may apply to algorithms, data, and meta-data, or to all of them. This does not preclude retaining centralization in some parts of the systems and applications. Typical P2P systems reside on the edge of the Internet or in ad-hoc networks. P2P enables:
•Valuable externalities, by aggregating resources through low-cost interoperability, the whole is made greater than the sum of its parts
• lower cost of ownership and cost sharing, by using existing infrastructure and by eliminating or distributing the maintenance costs
• Anonymity/privacy, by incorporating these requirements in the design and algorithms of P2P systems and applications, and by allowing peers a greater degree of autonomous control over their data and resources
However, P2P also raises some security concerns for users and accountability concerns for IT. In general it is still a technology in development where it is hard to distinguish useful from hype and new from old. In the rest of the paper we evaluate these observations in general as well as for specific P2P systems and applications.
P2P gained visibility with Napster’s support for music sharing on the Web [Napster 2001] and its lawsuit with the music companies. However, it is increasingly becoming an important technique in various areas, such as distributed and collaborative computing both on the Web and in ad-hoc networks. P2P has received the attention of both industry and academia. Some big industrial efforts include the P2P Working Group, led by many industrial partners such as Intel, HP, Sony, and a number of startup companies; and JXTA, an open-source effort led by Sun. There are already a number of books published [Oram 2000, Barkai 2001, Miller 2001, Moore and Hebeler 2001, Fattah and Fattah 2002], and a number of theses and projects in progress at universities, such as Chord [Stoica et al 2001], OceanStore [Kubiatowicz et al.
2000], PAST [Druschel and Rowstron 2001], CAN [Ratnasamy 2001], and FreeNet [Clark 1999].
Here are several of the definitions of P2P that are being used by the P2P community. The Intel P2P working group defines P2P as “the sharing of computer resources and services by direct exchange between systems” [p2pwg 2001]. David Anderson calls SETI@home and similar P2P projects that do not involve communication as “inverted client-server”, emphasizing that the computers at the edge provide power and those in the middle of the network are there only to coordinate them [Anderson 2002]. Alex Weytsel of Aberdeen defines P2P as “the use of devices on the internet periphery in a non-client capacity” [Veytsel 2001]. Clay Shirky of O’Reilly and Associate uses the following definition: “P2P is a class of applications that takes advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, P2P nodes must operate outside the DNS system and have significant or total autonomy from central servers” [Shirky 2001]. Finally, Kindberg defines P2P systems as those with independent lifetimes [Kindberg 2002].
In our view, P2P is about sharing: giving to and obtaining from a peer community. A peer gives some resources and obtains other resources in return. In the case of Napster, it was about offering music to the rest of the community and getting other music in return. It could be donating resources for a good cause, such as searching for extraterrestrial life or combating cancer, where the benefit is obtaining the satisfaction of helping others. P2P is also a way of implementing systems based on the notion of increasing the decentralization of systems, applications, or simply algorithms. It is based on the principles that the world will be connected and widely distributed and that it will not be possible or desirable to leverage everything off of centralized, administratively managed infrastructures. P2P is a way to leverage vast amounts of computing power, storage, and connectivity from personal computers distributed around the world.
Assuming that “peer” is defined as “like each other,” a P2P system then is one in which autonomous peers depend on other autonomous peers. Peers are autonomous when they are not wholly controlled by each other or by the same authority, e.g., the same user. Peers depend on each other for getting information, computing resources, forwarding requests, etc. which are essential for the functioning of the system as a whole and for the benefit of all peers. As a result of the autonomy of peers, they cannot necessarily trust each other and rely completely on the behavior of other peers, so issues of scale and redundancy become much more important than in traditional centralized or distributed systems.
Download Full Project Report
0 comments:
Post a Comment