|
Menu: |
Dscr /
HomePageThe Duke Shared Cluster Resource (DSCR) is a shared set of machines, some of which are provided by CSEM, some of which are owned by other research groups. These are not true "Beowulf" clusters but are simply sets of Intel x86 based machines running Linux. They have no keyboards and no monitors and thus you must log into the system using ssh. See the Figure below for a better description of the network arrangement. There are 4 "front-end" machines that users must login to first. The names of these head nodes are cluster1.csem.duke.edu through cluster4.csem.duke.edu. Machines cluster1 and cluster2 are identical 32-bit machines, all users may log into either on. Machines cluster3 and cluster4 are identical 64-bit machines, again all users may log into either of them. Note that which machine-type you compile on, 32- or 64-bit, determines where that executable may be run. You may want to log onto one, compile your code and rename it with a '32', then login to the another and re-compile your code, renaming it with a '64'. Once you are logged in to any front-end, you will be able to login from there to any node in the cluster. Most of your work will be done on the front-ends: compilation, job submission, debugging. The only time you may need to directly login to any node is for parallel debugging. The basic layout, or topology, of the cluster is that of a tree. Groups of 20 machines are connected to a single high-speed switch. Those switches are then connected to a higher level switch which will negotiate network connections that may exist between groups of machines. Note that this means some parallel jobs may see unexpected delays if they happen to span multiple switch subdomains. To learn more about gaining access to the DSCR, please click here: http://www.csem.duke.edu/support/cluster.html If you are a member of a group that already participates in the DSCR, please direct your new account request through your designated PointOfContact. The cluster is connected in something of a "fat star" topology. Groups of 20 machines are connected to "edge" switches via 1Gbps Ethernet. Those edge switches are connected to the "core" switch via 4Gbps links. ![]() The machine labeled "clustermon" is for monitoring of the system. You cannot login to this machine, but you can point a web-browser at it and find out the status of machines in the cluster: http://clustermon.csem.duke.edu/ganglia/ The last machine to mention is the file server for the cluster. Again, you cannot directly login to this machine. To transfer files in and out of the cluster, use It is worth noting that there are two separate networks in the Figure, one facing campus and one internal to the cluster. From the "outside" world, only the front-ends are visible and they are named as above, cluster[1/2/3/4].csem.duke.edu. On the internal network, those same machines are named head[1/2/3/4]. You cannot directly log into a compute-node from the campus cluster.
|