Adaptive Push-Pull: Disseminating Dynamic Web Data

Adaptive Push-Pull:Disseminating Dynamic Web Data Pavan Deolasee, Amol Katkar, Krithi,Ramamritham Indian Institute of Technology Bombay Dept. of CS University of Massachusetts

Outline • Introduction • Background description • Push vs. Pull • PAP algorithm • POP algorithm • Performance • Conclusion

Introduction • An crucial issue in the dissemination of time-varying web data is the maintenance of temporal coherency. • Clients Pull the data based on the dynamics of the data and a user’s coherency requirements. • Servers with Push capability maintain state information pertaining to clients and push only those changes that are of interest to a user.

Introduction(cont.) • Complementary properties of these two techniques:temporal coherency, communication overheads, state space overheads, loss of coherency due to server failure. • This paper depicts how to combine push and pull –based techniques to achieve the best features of both. • The experimental results demonstrate that such adaptive data dissemination is essential to meet diverse temporal coherency requirements, to be resilient to failures.

Background description • Time-varying : changes frequently. • Coherency requirements associated with user tolerence.Eg: a user may desire to have stronger coherency requirements for data items such as stock prices than news information. • Maintaining Temporal coherency on the web • User specify a temporal coherency requirements (tcr)for each data item of interest. • tcr : the maximum permissible deviation of the cached value from the value at the server. tcr can be specified in units of time or value(5mins or 1dollar).

Background description • |U(t) – S(t)|≦ c • fidelity f : observed by a user to be the total length of time that the above inequality holds. • We focus on two issues:user specified coherency, fidelity requirements.

Background description • The need for combining Push and Pull • Push-based: suitable when a client requires its coherency requirements to be satisfied with a high fidelity, or when the communication overheads are the bottleneck. • Pull-based: suited to less frequently changing data or less stringent coherency requirements, and when resilient to failures is important. • The goal is combining push and pull in an adaptive manner while offering good resiliency and scalability.

Push vs. Pull • Pull • Time To Refresh (TTR) attributes with each data item. • TTR denotes the next time the proxy should poll the server to refresh the data which has changed in the interim. • Rapidly changing data results in a smaller TTR. • Proxy only pull those changes that are of interest to the user, need not pull every single change.

Pull • Adaptive TTR approach • [TTRmin,TTRmax] : range of TTR value. • TTRmr :smallest TTR value used so far. • TTRdyn:according to the last two recent changes toreflect the changes in the future.TTRdyn=(w × TTRestimate)+((1-w) × TTRlatest)TTRestimate = • Weight w: 0.5 < w < 1, initially 0.5

Pull • Computing TTRadaptive:Max(TTRmin,Min(TTRmax, a×TTRmr+(1-a) ×TTRdyn))0≦a≦1,with a higher fidelity, a higher value of a.

Push • The proxy registers with a server, identifying the data of interest and the associated tcr, i.e.,the value c. • Whenever the value of the data changes, the server uses the tcr value c to determine if the new value should be pushed to the proxy. • The current value Dιis pushed if and only if | Dι－ Dκ|≧c , 0< κ < ι. Dκwas the last value that was pushed to the proxy. • The server needs to maintain state information of a list of proxies and data items, the tcr of each proxy and the last update sent to that proxy.

Push(cont.) • Key advantage: it can meet stringent coherency requirements.Since the server is aware of every change.

Performance of Pull vs. Push • Experimental model • Pull: vanilla HTTP web server with prototype proxy. • Push: prototype server uses unicast and connection-oriented sockets to push data to proxies. • On a local intranet. • Traces used • Using real world stock price streams as dynamic data. • A trace that is 2 hours long,approximately 15000 data values.

Performance of Pull vs. Push • The Pull approach was evaluated using the Adaptive TTR algorithm with a = 0.9,TTRmin =1 sec,and threeTTRmax values of 10,30 and 60 seconds.

Performance of Pull vs. Push • Maintenance of Temporal Coherency • A push-based server is well-suited to achieve a fidelity value of 1. • As to pull-based server, the frequency of the pulls(the assignment of TTR values) determines the degree to which client needs are met. • We quantify the fidelity of pull-based approach in the probability that user’s tcr will be met. • Measuring the durations when |U(t)-S(t)| >c. • Let δ1, δ2 ,…,δn denotes the total time for which data was observed by a user.

Performance of Pull vs. Push • The fidelity is expressed as a percentage • Observed_period is the total time for which data was observed by a user.

Pull-based algorithm with adaptive TTRs

Performance of Pull vs. Push • Communication Overheads • In a push approach, number of messages transferred over the network is equal to the number of times the user is informed of data changes. • A pull approach requires two messages:HTTP request and response per poll. • We quantify communication overheads of the number of messages exchanged between server and proxy.

Coherency requirement $0.05≦c ≦$0.4

Performance of Pull vs. Push • Resiliency • By virtue of being stateless, a pull-based server is resilient to failures. • In contrast, a push-based server maintains crucial state information about the needs of its clients; this state is lost when the server fails.

PAP approach • PAP(Push and Pull): simultaneously employs both push and pull. Client register and informs its coherency requirements tcr. Pull with alg. A to decide its TTR value Client Server pushes the new data value, if server fails, client is served as well as alg. A Server

PAP approach • Approximation to compute the client’s next TTR. • Let diff = T(i) －T(i-1) • Server predicts the next client polling time as tpredict =T(i) + diff. • In practice, server should allow the client to pull data if the changes of interest to the client occur close to the client’s expected pulling time. • Server waits, for a duration of ε,for client to pull. • If a client doesn’t pull when server expect it to, the server extends the push duration by adding (diff－ε) to tpredict

PAP approach • A series of rapid changes occur during (ti，ti+1) , the probability that some violations may occur in (ti，ti+1) is very high and thus these changes will also be pushed by the server. • This occurrence further forcing a decrease in the TTR at the proxy and causing frequent polls from the proxy. • So the TTR value at the proxy will tend towards TTRmin and diff will also approach zero. • Thus making the durations of possible pushes from the server close to zero.

POP approach • POP: Dynamically choosing between Push or Pull. A server can categorize its clients either as push or pull one and this categorization can change with system dynamics. • Basic ideas: • allow failures at the server to be detected early so that,if possible,clients can switch to pulls. • servers are designed to push data values when one of two conditions is met. • Server can be designed to provide push service as the default to all clients provided it has sufficient resources.

POP approach • If the request desires 100% fidelity and the server doesn’t have sufficient resources to satisfy it, then the server takes steps to convert some push clients to pull. If this conversion is not possible, then the new request is denied.

Conclusion • Combining Push and Pull approaches • Goal :maintaining user specified coherency and fidelity requirements. • Intelligent : ability to dynamically choose the most efficient set of mechanisms to service each application. • Adaptive : adapt a particular mechanism to changing network and workload characteristics.

Adaptive Push-Pull: Disseminating Dynamic Web Data