1 / 30

Isis 2 Runtime Parameters

Isis 2 Runtime Parameters. Cornell University. Ken Birman. Parameters. Many features of Isis 2 depend on parameters you can modify to “shape” the behavior of the platform. They give you very fine control over behavior of Isis 2 There are three main categories of parameters

uri
Download Presentation

Isis 2 Runtime Parameters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Isis2 Runtime Parameters Cornell University Ken Birman

  2. Parameters • Many features of Isis2 depend on parameters you can modify to “shape” the behavior of the platform. • They give you very fine control over behavior of Isis2 • There are three main categories of parameters • Those that determine how the system will start up • Those that determine how it sends messages • Those that control limits, timeouts and other bounds

  3. Startup Parameters What happens when you call IsisSystem.Start()?

  4. How IsisSystem.Start() works • The library initializes itself and determines the IP address of “local host.” If the host has several IP addresses, it picks the last of the IPv4 addresses • The system scans the “environment” variables to read values of the parameters. These will override the default values compiled into Isis2 • In Linux/bash, use “export” to set them, either in .bashrc or in a shell script. Or call setenv(2) • In Windows, use the “set” command, or callEnvironment.SetEnvironmentVariable("something", somevalue);

  5. How IsisSystem.Start() works • Next, the system decides which network interfaces it should use (all of them, unless you tell it otherwise by setting ISIS_NETWORK_INTERFACES) • Do this if you expect to run on machines that have a “production” network and a “management” network • Otherwise leave ISIS_NETWORK_INTERFACES alone • Having done this, it attempts to contact the ORACLE • If the ORACLE isn’t found, it restarts the ORACLE • Otherwise, it asks the ORACLE to let it join the ISISMEMBERS system group

  6. Logging • Normally, upon restart, Isis2 creates a log file for messages printed by the library • You can inhibit this by setting ISIS_MUTE=true • You can also direct that messages be echoed to the Debug stream rather than the Console when calling IsisSystem.Start() • If you allow logging and want to write to the log, call IsisSystem.Write() or IsisSystem.WriteLine() • Output goes to the log plus to Console, or Debug stream

  7. Fast start: But there can only be one… • For extreme speed, you can tell Isis2 not to hunt for the ORACLE (by specifying an argument to IsisSystem.Start) • It will restart instantly. But if you launch two instances this way, they won’t communicate with one-another. • So… do this only in the first instance that you launch

  8. Overwhelming the Membership Oracle • If processes start one by one, no issue…. • But what if you try to start 50 at once, or 500? Hello? Oracle Welcome! Oracle

  9. Master/Worker • If a system will be big, launching hundreds of members can overload the ORACLE. • Better performance: add many all at the same time • In this case use the Master/Worker pattern • Master starts first, collects a list of the workers • Workers start after the master and register with it • Then Master can add a batch of workers to the system, and to any groups that are desired

  10. Master: Accumulates workers, tells them what to do staticvoidbeMaster(string[] args) {IsisSystem.Start();Semaphore waitForWorkers = new Semaphore(0,1);boolfullyStaffed= falseList<Address> myWorkers = newList<Address>();IsisSystem.RegisterAsMaster((NewWorker)delegate(Addressworker) {lock (myWorkers)if(fullyStaffed)IsisSystem.RejectWorker(worker);else {myWorkers.Add(worker); if(myWorkers.Count() == GOAL) {fullyStaffed = true; waitForWorkers.Release(1); } } });waitForWorkers.WaitOne();IsisSystem.BatchStart(myWorkers); // This delays until they have all finished their batch startIsisSystem.WaitForWorkerSetup(myWorkers);Group.MultiJoin(myWorkers, new Group[] { myGroup}); // In front of this next line do whatever you want this application to doIsisSystem.WaitForever(); // If the master shuts down, its workers will tooIsisSystem.Shutdown();} Accumulate workers Main thread waits until enough workers have connected, then starts them all at once… … Then adds them all to groups we may want to use

  11. RunAsWorker: Let Master run the show staticvoidbeWorker(string[] args) {// This next line assumes that argument 0 is the master's Address // You can also use new Address(mastersHost, 0) if you know the host IP // address of the master but don’t know the master’s pid. IsisSystem.RunAsWorker(args[0]); // This line blocks until the master issues the BatchStart() call // Notice that in this one special case we call it AFTER RunAsWorker! IsisSystem.Start(); // Before calling this next line do whatever setup this worker must do: // create your group handles and register callbacks – but don’t call Join // For example, you might call g = new Group(“something”), then call // g.ViewHandlers += myViewHandler; … etc – anything needed to have the // group ready for a Join. But you call SetUp done INSTEAD of g.Join(). IsisSystem.WorkerSetupDone(); // Now, for each group the Master created using a multijoin, you wait // for its first view to be reported. This is one way to do that: foreach (GroupginmyGroups) while (!g.HasFirstView) Thread.Sleep(250); // WaitForever would freeze the main thread but if the worker has joined // groups (or gets added to groups by the master using MultiJoin(), the // worker could be quite active, receiving messages, sending them, etc) IsisSystem.WaitForever(); // If the master shuts down the worker will throw an // IsisException("master termination"); // If this next line actually executes, this particular worker will exit // (in effect, this worker is a normal Isis application by now, except that // if the master terminates, it does too. In particular, it can // deliberately chose to leave the system if it wishes to do so IsisSystem.Shutdown(); }

  12. Master/Worker Timeline • Worker • Master IsisSystem.Start(); . . . Accumulate workers IsisSystem.RunAsWorker(mAddress); IsisSystem.Start(); Oracle Reached goal IsisSystem.BatchStart(myWorkers); Group myGroup= newGroup(“myGroup”); . . . Attach handlers for myGroup, thenmyGroup.Join(); Group g= newGroup(“myGroup”);. . . Attach handlers for g, but don’t call Join IsisSystem.WorkerSetupDone(); IsisSystem.WaitForever(); IsisSystem.WaitForWorkerSetup(myWorkers); Setup done for all workers Group.MultiJoin(myWorkers, new Group[] { myGroup}); IsisSystem.WaitForever(); New view foreach (GroupginmyGroups) while(!g.HasFirstView) Thread.Sleep(250);

  13. Why does this help? • Workers only send one message to Master • Hence it experiences less load • It adds them all at once, first to the system, then to whatever groups the application will use • Hence only one group view needs to be sent, and it can be sent efficiently, using a broadcast • Overall load is much reduced

  14. Messaging Parameters How to control what internet protocols Isis2 uses

  15. IP multicast / ISIS_UNICAST_ONLY • Isis2 will broadcast to find the ORACLE unless you tell it not to do so. • Default: OK to use IP multicast, UDP, broadcast • ISIS_UNICAST_ONLY: don’t use IP multicast. Still requires UDP (older ISIS_TCP_ONLY feature was eliminated starting in Isis v2.1) • You must list the machines on which Isis2 ORACLE will run if you put the system in ISIS_UNICAST_ONLY mode. ISIS_HOSTS=“…”

  16. Normal versus UNICAST_ONLY • With normal IP multicast packets are still sent directly • With ISIS_UNICAST_ONLY, packets travel on a tree of point-to-point links and must be forwarded, perhaps log2(N) times IP multicast Unicast tree: power of 2 “reach”

  17. ISIS_HOSTS • Idea is to list the places where the ORACLE can run ISIS_HOSTS=c1.cs.cornell.edu,c2.cs.cornell.edu … or ISIS_HOSTS=192.167.54.133,192.167.54.134 • Processes running on other machines can join the system but can’t restart it from scratch

  18. ISIS_HOSTS: numerical is best! • We have seen bugs in the Linux DNS when accessed from Mono. Sometimes it hangs • To avoid this, use fully numerical IP addresses when you set the values in ISIS_HOSTS • Use the IPv4 addresses for the machines on which you want the ORACLE to run. In this case DNS never hangs • The “ping” and “traceroute” commands are examples of ways you can look these up. • On Windows, string names are fine. On Linux, they work, but don’t put the DNS under heavy load.

  19. ISIS_PORTp • The system uses two standard IP ports • ISIS_PORTp: for p2p messages • ISIS_PORTa: Set to ISIS_PORTp+1, for acks/nacks • These ports should not be blocked by your firewall • On Linux, also check iptables, which is like a firewall • If two instances of Isis2 use non-overlapping port ranges, they will not notice one-another.

  20. ISIS_MAXIPMCADDRS • When permitted to use IP multicast, Isis2 tries not to overuse that feature: • ISIS_MCRANGE_LOW: low-end of the IPMC address range Isis2 should use. By default, CLASSD+5000, where CLASSD is 244.0.0.0/8 • ISIS_MCRANGE_HIGH: high-end of the IPMC range • ISIS_MAXIPMCADDRS: limit on how many multicast addresses Isis2 can use, system-wide. It is perfectly reasonable to set this to a small number, like 5 or 10. The system should work if ISIS_MAXIPMCADDRS2. • If ISIS_UNICAST_ONLY is true, then no IPMC addresses are used at all.

  21. ISIS_TTL • Broadcast and multicast messages are automatically relayed by routers • Each “hop” causes the “time to live” field in the message to be decremented • If the TTL reaches zero, the router drops the packet • Isis2 initializes the TTL value using ISIS_TTL. • You can set this to 0 or 1 to confine the system to a single segment of your network.

  22. ISIS_MAXMSGLEN • Automatically adjusted but you can provide a recommended value if you wish • Isis2 will override the value in some situations • Normally not something you would need to modify • If a message is too large, Isis2 will automatically fragment it and reassemble it prior to delivery

  23. Other limits and timeouts These are less often changed

  24. ISIS_DEFAULTTIMEOUT • Normally 45secs. OK to reduce if you wish. • Failure detection needs twice this long, hence 90s. • This applies if you kill a process “suddenly” (e.g. ^C) or if the machine on which it was running crashes • 45s is very slow, but on cloud computing systems long delays happen more often than you would expect! • On lightly loaded clusters, you can set ISIS_DEFAULTTIMEOUT much lower, but not less than 2s. • If you design a failure sensing solution of your own, call Isis.ProcessFailed(who) to tell us if a process crashes.

  25. Help! I’ve been poisoned! • If a process throws this exception, it means that some other process thought it had failed • If a dead process reappears, live members send it a “you have been poisoned” message • Prevents system partitioning • Rule in Isis2: Only allow a single partition to remain alive at one time. If a partition forms, immediately shut one side down (the side lacking a majority)

  26. Speeding up failure detection • If a process will exit (rather than crash), call IsisSystem.Shutdown() first. • This rapidly announces the departure and the process will immediately be removed from groups it belongs to • Like a fast failure notification – as if it said “bye!” • You can also eliminate a group rapidly (without killing its members) using g.Terminate()

  27. Hints for EC2 users • On EC2 we recommend using ISIS_UNICAST_ONLY • EC2 gives you a “virtual cluster” with nodes numbered from IP address xxx.xxx.xxx.0. You can use this range to set ISIS_HOSTS even before launching your application • If you use the Master/Worker startup mode, you can tell the system the master is at: • new Address(xxx.xxx.xxx.0, 0); • This works because the master will run on node xxx.xxx.xxx.0 (due to ISIS_HOSTS) and the pid is ignored in the BeWorker call, so using 0 is fine.

  28. Debugging Isis2 issues How can it be done?

  29. Debugging is hard… • … debugging distributed systems even harder • Useful tools • Visual studio. Keep in mind that even an exception thrown inside Isis2 could be caused by a mistake in your code. All those upcalls will be issued from Isis2 stacks! • You can call IsisSystem.GetState() to obtain a string representing the state of the Isis system itself. But you’ll need help from Cornell experts to understand this data. • You can call IsisSystem.RunTimeStatsState() to obtain a self-explanatory string with counts of messages sent and received. The data itself is in IsisSystem.RTS, and you can access this at runtime.

  30. Suggestions • Isis2 is multithreaded. So write thread-safe code. • Don’t block during upcalls from Isis2 into your code. The library assumes that upcalls will complete quickly and could malfunction otherwise. • Isis2 has a lot of threads. Don’t let this worry you. • We gave you the source code. If you notice a bug, post it to isis2.codeplex.com on the “issues” page • Post questions on the codeplex “discussions” page

More Related