Freenet: A Distributed Anonymous Information Storage and Retrieval System

Freenet: A Distributed Anonymous Information Storage and Retrieval System 박사 과정 6차 학기 김 훈 규

Topics • Overview • Related work • Architecture • Keys and Searching • Retrieving data • Storing data • Managing data • Adding nodes • Protocol Details • Performance analysis • Security • Conclusion

Overview (1/2) • What is Freenet? • 인터넷상에서 의사소통의 진정한 자유를 보장하기 위해 설계된 P2P application • 완전한 익명성 하에서 누구나 정보 제공/획득 가능 • 누구도(controls Freenet, not even its creators • Freenet nodes에 의한 통신은 암호화 됨 • 정보 요청자가 누구인지, 내용이 무엇인가를 결정하는 것이 어렵게 하기 위해 경로를 통해 다른 노드로 접근 • Who is behind Freenet? • Originally, Ian Clarke while a student at the University of Edinburgh, Scotland. • Still supervised by Ian Clarke, though many other people contribute to the project.

Overview (2/2) • Purpose • Prevent information censorship • Maintain personal privacy • Design Goals • Anonymity for information producers, consumers, and holders • Deniability for storers of information • Resistance to information censorship • High availability and reliability through decentralization • Efficient, scalable, and adaptive storage and routing

Related work • File-sharing • Gnutella, FastTrack, Overnet • Consumer Anonymity • Anonymizer, SafeWeb/Triangle Boy • Producer Anonymity • Rewebber, TAZ, Publius • Shared-storage • OceanStore, Cooperative File System, PAST

new old Architecture (1/2) • Peer-to-peer network dynamic routing table (LRU) local datastore file (a) file (b) file (c) file (d) data deleted

Architecture (2/2) • A cooperative distributed filesystem incorporating location independence and transparent lazy replication • Basic Model • Key 요청이 proxy requests chain을 통해 node에서 node로 전달 • 각 node는 요청을 다음에 어디로 보낼 것인를 자체적으로 결정 • Routing algorithm : adjust routes over time to provide efficient performance • Request • hopes-to-live : to prevent infinite chains • pseudo-unique random identifier : prevent loops by rejecting request they have been seen before • result is passed back up the chain to the sending node • No node is privilege : no hierarchy or central point of failure

Keys and searching • Files in Freenet are identified by binary file keys • applying hash function : 160-bit SHA function • Key types • Keyword-signed key (KSK) • simplest type of file key • A short descriptive text string chosen by the users • Signed-subspace key (SSK) • Used primarily for data storage • Generated by hashing the content • Content-hash key (CHK) • Generated with a public key and (usually) text description, signed with private key • Can be used as a sort of private namespace • Description e.g. politics/us/pentagon-papers

Keyword-signed key (KSK) SHA(keypub) = KSK keypub KSK@plays/shakespeare/Coriolanus <keyword> keypriv keypriv(File) Signature, minimal integrity check File E(<keyword>, File) Encryption • To retrieve the file • user need only publish the keyword • Problematic flat global namespace • 두 사용자가 상이한 파일에 대하여 동일한 keyword를 각각 독립적으로 선택하는 것을 막을 수 없음

<keyword> SHA(<keyword>) XOR SHA(…XOR…) = SSK SHA(S-keypub) SSK@rBjVda8...cPAgM/TFE// S-keypub randomly generated S-keypriv S-keypriv(File) Signature File E(<keyword>, File) Encryption Signed-subspace key (SSK) • To retrieve the file • user need only publish the keyword together with subspace’s public key • Storing data • require the private key • the owner of subspace can add file to it

Content-hash key (CHK) (1/2) randomly generated • Useful for implementing updating and splitting • To retrieve the file • user publish the content hash key itself together with the description key • Most useful in conjunction with signed-sunspace keys using an indirection mechanism • To store an updateable file • inserts file under its content-hash key • insert an indirect file under a signed-subspace key whose contents are the content-hash key Keyencrypt E(File) Encryption File SHA(File) = CHK CHK@WX0fa7GU...MAwI,jMQymYuK

Content-hash key (CHK) (2/2) • To update a file • insert a new version (new CHK) • insert a new indirect file (original SSK) • Splitting files • Desirable because of storage and bandwidth limitation • 트래픽에 잇점 • 각 부분을 CHK하에 별도로 삽입을 하고, 하나의 indirect file (or multi levels of indirect files)을 만들어 각 부분들을 point 함 • Problem of Finding keys in the first place • hypertext: conflict with the design goal (decentralization) • create a special class of lightweight indirect file • 파일 삽입 시, 파일에 대한 포인터를 갖고있는 일련의 indirect files를 삽입 • create compilation of favorite keys, publicize (use on WWW)

Retrieving data (1/3) • Data request message • Key, Hope-to-live value (HTL), Unique ID search file in datastore [found] send Data Reply [Data Reply] [not found] lookup nearest key in routing table [ok] check HTL send Data Request [not found] [not ok] wait for answer send Data Failed [Request Failed] • Retrieve successful • node will pass the data back to the upstream requestor • cache the file in its own datastore and create a new entry in routing table (actual data source with request key)

c start 2 1 3 a b 12 11 4 6 7 data d 10 5 f e 9 8 Retrieving data (2/3) • A typical request sequence Data Request Data Reply Request Failed • Data Return path : d -> e -> b -> a • cached node : e, b, a

Retrieving data (3/3) • Quality of the routing should improved over time • specialize in locating set of similar keys • specialize in storing clusters of files having similar key • Transparently Replicated • popular data to be transparently replicated and mirrored closer to requestor • create new routing entries for previously-unknown nodes, increasing connectivity • direct link to data source are created, bypassing the intermediate node used • node that successfully supply data will gain routing table entries and be contacted more often than nodes that do not

Storing data (1/2) • Insert message • Key, Hope-to-live value (HTL), Unique ID • check its own store to see if file already exist and insert check if key already exists lookup nearest key in routing table send insert message [not found] [ok] check HTL [not found] [not ok] wait for answer send “all clear” [Request Succeed] Return reply [found] try again using a different key

Storing data (2/2) • All clear : successful result • HTL value is reached without a key collision being detected -> propagate back to the original inserter • user send the data to insert, propagate along the path, and stored in each node along the way • each node create an entry in routing table with the new key

Managing data • Storage : LRU cache • If datastoreis full, the least recently used files are evicted • There is no permanent copy • Once all the node have decided to drop a particular file, it will no longer be available to the network • Advantage of expiration mechanism • allow outdated documents to fade away naturally after being superseded by newer document • Encrypted content • node operator not to explicitly know the content • 암호화 절차는 파일을 보호하려는 의도가 아니고 node operator가 저장된 내용을 알지 못하게 하려는 의도

Adding nodes • New nodes must announce their presence • Two conflict requirement • 라우팅 효율성 향상을 위해, 모든 노드들은 신규 노드에 보낼 키 결정에 일관성이 있어야 함 • 보안을 위해, 한 노드가 라우팅 키를 선정할 때, 일관성을 지키기 위한 가장 직관적인 방법은 제외함 • Use cryptographic protocol • Announce public key and physical address (e.g. IP) to an existing node • Announcement is recursively forwarded to random nodes • Nodes in the chain then collectively assign the new node a random GUID

Protocol details (1/2) • Packet-oriented and uses self-contained messages • For efficiency, nodes using a persistent channel(TCP) • Node address = IP address + port number • Nodes with frequently changing IPs use ARKs • signed-subspace keys updated to contain the current real address • Transaction Message • Request.Handshake • specifying the desired return address of the sending node • Reply. Handshake • specifying protocol version number • Handshakes are remembered for a few hours

Protocol details (1/2) • To request data • Request.Data (Trans ID, HTL, depth, search key) • Reply.Restart • Send.Data : when the request is succeed • Reply.NotFound : when the request is failed and HTL are completely used up • these message terminate the transaction and release any resource held • Request.Continue : remaining HTL • To insert data • Request.Insert (Random Trans ID, HTL, depth, Key)

Performance analysis (1/4) • Topology • 1,000 node networks, datastore = 500 item, routing table size = 250 address • key associated with links are hash of destination IPs • Network convergence

Performance analysis (2/4) • Scalability • Key assigned to new nodes = H(IP) • Scales as log(n) until 40,000 nodes • at 40,000 node, RTs are full

Performance analysis (3/4) • Fault-tolerance • Median path length < 20 at 30% node failure • network becomes ineffective at 40% failure

Performance analysis (4/4) • Small-world model • Most nodes form local • Few high link connecting node • Power law distribution provides high degree of fault tolerance

Security • Protect the anonymity of requestors and inserters of files • Key anonymity (receiver anonymity) • sender anonymity : local eavesdrop • Anonymity of storer : encrypted contents • Pre-routing • Mesg. Encrypted by public keys which determine path of pre-routing • Protecting data source • using random and probabilistic methods • Malicious modification : hash key and signature • Denial-of-service : insert a large number of junk files to storage

Conclusion • Provides a network to anonymously store and request files • Adaptive routing who’s efficiency increase with experience • Deal with privacy and data integrity in various scenarios • Freenet is an ongoing project that still has plenty of flaws • There may be a tradeoff between network efficiency and anonymity, robustness. • More information at http://freenetproject.org/

Freenet: A Distributed Anonymous Information Storage and Retrieval System