Wednesday, February 22, 2012

Content moved from my old work blog

All of the posts added today are old posts from my work blog.  I don't work there anymore and I can't expect them to keep my account open forever, so here it is.  A lot of it is probably out of date, but it's a good reference for future issues.

Providing print services with Samba

The instructions are Debian Lenny-specific as far as installing packages, but the rest should be applicable to whatever inferior distribution of *nix you choose (just kidding.) These instructions do not cover using raw print queues in which the clients use their own driver to format the print job and CUPS passes the job as it is to the printer. That method does not work with quotas because page counting is done in the pstops filter. The print server is assumed to be on the same hardware as the samba server. If it isn’t, you’ll have to configure CUPS to accept jobs from the samba box and make changes to the smb.conf that point to the location of the CUPS server.

Software and Files

Start with a working Samba member server, then install the following packages:
aptitude install cups
aptitude install hplip
# HP laserjet PPDs for just about every HP LJ made. 
# Don't bother if you are not setting up an HP queue, of course.
 
If you aren’t setting up an HP printer, get a PPD for Windows NT/2000 for your printer and put it on your print server. You can use NT PPDs with CUPS without problems. You will need to get the following files from a Windows client, located in %WINDIR%\SYSTEM32\SPOOL\DRIVERS\W32X86\3 or %WINDIR%\SYSTEM32\SPOOL\DRIVERS\X64\3 on a 64 bit client:

ps5ui.dll
pscript.hlp
pscript.ntf
pscript5.dll
 
You will also need to get the following from http://www.cups.org/software.php, located in the cups-windows-6.0-source.tar.gz package (in the i386 folder):

cups6.inf (from www.cups.org)
cups6.ini (from www.cups.org)
cupsps6.dll (from www.cups.org)
cupsui6.dll (from www.cups.org)
 
Place all of those files, the CUPS and the Windows ones, in /usr/share/cups/drivers.

Set Up the Queues

CUPS can be left the way it is. As long as you are going to use samba for the front end, you don’t need to reconfigure CUPS to be available on the local network. By default, it listens on localhost only and that works just fine. There are two ways to set up CUPS print queues, the command line or the web interface. The web interface makes it ridiculously easy, so I recommend that method. I went to the Ken Han School of System Administration (“GUIs are for desktops, not servers”), so I use lynx:
lynx localhost:631
Then just click on links and select options from the drop down menus until you get your queue set up. There is are examples on proper entries for how you connect to your printer and a help page that gives more info. Most likely, you have an HP LaserJet with an embedded JetDirect , so you would enter

socket://192.168.1.10:9100
 
on the page that asks for how you connect to your printer. For a directly-connected USB printer, it should show up. If not, use lpinfo from the command line to get a list of available devices:

bullet:/home/matt# lpinfo -v
network socket
network beh
file cups-pdf:/
direct hal
direct hp:/usb/deskjet_5100?serial=ABC123456
direct hpfax
direct hp:/usb/HP_LaserJet_1022?serial=ABC123X35
network http
network ipp
network lpd
direct parallel:/dev/lp0
direct scsi
serial serial:/dev/ttyS0?baud=115200
network smb
 
See that file cups-pdf:/ entry? There is a cups-pdf package you can install so your clients can print to pdf and have the resulting file placed in some folder. That’ll be a future write up, but it is a pretty cool feature.

If you placed a Windows PPD on your server, you will need to enter the path to it on the page that asks you for the Manufacturer.

Check that you can send a test page to the printer once the queue has been set up. If the printer and server are in different VLANs, get the appropriate firewall port opened up. You will definitely need from the server to the printer, and may need to get SNMP from the printer to the server. I haven’t set up the latter, but I think it’s possible to get status reports from the printers that way.

smb.conf Entries

There is a link on the Administration page to Export to Samba. Don’t do this yet. Make sure your smb.conf has the following two sections in it:

[printers]
    comment = All Printers
    path = /var/spool/samba
    printer admin = root, "DOMAIN\yourauthorizeddomainuser"
    guest ok = Yes
    printable = Yes
    browseable = No

[print$]
    comment = Printer Drivers
    path = /var/lib/samba/drivers
    admin users = root, "DOMAIN\yourauthorizeddomainuser"
    write list = root, "DOMAIN\yourauthorizeddomainuser"
 
The first is the share your clients get the print queues from, the second is a hidden share that holds the drivers. Check that the paths exist on your filesystem or change them to something more appropriate. You may not want the guest ok = Yes and you may want to make the printers share browseable, so make the appropriate changes. Save it then run testparm to make sure you haven’t messed up the config file.

Stop winbind, restart samba (smbd and nmbd), restart winbind.

Using rpcclient

Now to see if samba sees the queues.

rpcclient  mysambabox -U DOMAIN\\myauthorizeddomainuser
Password:
rpcclient $> enumprinters
    flags:[0x800000]
    name:[\\sambabox\hpljuh054]
    description:[\\sambabox\hpljuh054,,HP LaserJet P3005]
    comment:[HP LaserJet P3005]

    flags:[0x800000]
    name:[\\sambabox\hpljuh052]
    description:[\\sambabox\hpljuh052,,HP LaserJet 2430]
    comment:[HP LaserJet 2430]
 
In the description, the second entry is blank. That’s where the driver is listed after exporting the printer to samba via cupsaddsmb. At this stage, if you type in ‘enumdrivers’ at the rpcclient prompt you’ll get an error message. Type ‘exit’ to get out of the rpcclient prompt and go back to the CUPS web interface. Click on the Administration link, then click on Export Printers to Samba. Use the check boxes to select the printers you want to share, enter your DOMAIN\youraccount and password, click Export Printers to Samba and cross your fingers. If it failed, you go to a page that said the action was unsuccessful and a link to a less than useful log of what happened. Upon success, or partial success, you get a very encouraging page. Don’t believe it until you go back to the rpcclient prompt and verify everything was registered properly.

It's extremely important to use either a root account (if your samba server is part of a samba domain) or set the permissions on the print directories to be writable by the domain account used when exporting printers.  It is also necessary to grant printing privileges to the account being used to export printers.  Having insufficient rights will cause unhelpful error messages to be generated and lots of frustration.

rpcclient $> enumprinters
    flags:[0x800000]
    name:[\\sambabox\hpljuh054]
    description:[\\sambabox\hpljuh054,hpljuh054,HP LaserJet P3005]
    comment:[HP LaserJet P3005]

    flags:[0x800000]
    name:[\\sambabox\hpljuh052]
    description:[\\sambabox\hpljuh052,hpljuh052,HP LaserJet 2430]
    comment:[HP LaserJet 2430]
 
Now the driver name appears in the description. Check for the registered drivers with

rpcclient $> enumdrivers

[Windows NT x86]
Printer Driver Info 1:
    Driver Name: [UH054-LaserJet]

Printer Driver Info 1:
    Driver Name: [UH054LaserJet]

Printer Driver Info 1:
    Driver Name: [hpljuh054]

Printer Driver Info 1:
    Driver Name: [UH052LaserJet]

Printer Driver Info 1:
    Driver Name: [hpljuh052]

Printer Driver Info 1:
    Driver Name: [UH054-LaserJet]
 
Now you are ready to connect from a client. Connect to the share name listed above from a domain client and you should see a queue window pop up after a short delay. Drivers are downloaded automagically to the client with this set up, just like on a real Windows server.

Comments

This is the part where I warn you that you may follow the directions here and at the links below and still have issues. I think that samba sometimes emulates a Windows server too well and takes time for some things to register, which isn’t typical of a *nix service. I had a really hard time with exporting the queues from CUPS to samba. After moving on to another project for a few days and then finally getting back to finishing this one, it Just Worked(tm). I can’t tell you exactly what wasn’t working before or what made it work correctly in the end, as I just picked up where I left off and started the procedure fresh with success.

Update: I've since done this again on a completely different system and found that cupsaddsmb expects you to use a root/admin account in order to create /var/lib/samba/drivers/W32X86/3/ directory structure.  Seems obvious as it's consistent with samba to require the linux and Windows permissions to be correct, but it's easy to forget that your account on a Windows domain that has print admin privileges is not allowed to create directories.  This became very obvious when the LDAP root/admin account was deactivated and a Domain Admin account was unable to add printers.

This how-to gets you a pretty decent print server with basic features like auto driver download. Quotas can be established, delegated administrators can be declared for particular queues, and some other stuff can be done (like custom CUPS filters that reject jobs in certain formats, like .psd that students insist on plugging up queues with.)

Future Notes

Setting up quotas is next. The CUPS line is like this:

lpadmin -p myprinter -o job-quota-period=6048000 -o job-page-limit=150
 
The above can only be run after the queue is set up in CUPS as it modifies an existing queue. You can specify all of that stuff when you make the queue if you do it from the command line, but it is a hassle. The -p specifies the queue to modify and the -o flags set options on the queue. The job-quota-period is specified in seconds, so the above would be ten weeks, and the page-limit is of course per page. I have not tested this yet, when I do I’ll write up anything odd that needs to be done besides the above line.

There are a few things to keep in mind. Quotas apply to every user on a particular queue, you cannot specify one quota for faculty and one for students. For accurate page accounting, the job has to pass through the pstops filter. An image file typically goes through the imagetops filter and gets a default count of ’1.’ This isn’t terrible, as most images are one page anyway, but if someone manages to split an image file across multiple pages the count will be incorrect. Lastly, the print server does not give a very useful message when a user has reached their limit, something to the effect of ‘error sending job’ or something equally cryptic.

Resources

Samba Cups How-To
cupsaddsmb man page
PostScript

Disk-based virtual machine how to

Procedure for creating PV machines with Debian Etch amd64.  The procedure is much the same for Lenny, the differences are noted in each step.

Turn on HVM support in BIOS.  Of course, not necessary for PV, but then it’s on if you decide to do some HVM machines later.

The following assumes that you did a standard installation of Debian, using LVM with the guided partitioning scheme. Guided partitioning results in most of the disk space being allocated to the /home directory. If you do manual partitioning from the installer, skip to debootstrap.

Install xen, xen-hypervisor-3.2-1-<arch>.
Edit /boot/grub/menu.lst, adding “console=tty0 and console=hvc0″ to the end of the “module /boot/vmlinuz” line for the xen entries.  Also edit the inittab to have the following entries:

1:2345:respawn:/sbin/getty 38400 hvc0
2:23:respawn:/sbin/getty 38400 tty1 # I just commented out the original entries and changed 1 and 2 to match these.

Resize the /home partition.
umount /dev/debian0/home # substitute debian0 with the volume group name
e2fsck -f /dev/debian0/home
resize2fs /dev/debian0/home 4.5G # dirty way of ensuring you don’t stomp
lvreduce -L 5G /dev/debian0/home # the end of the file system
resize2fs /dev/debian0/home
mount -t ext3 /dev/debian0/home /home

Create a new partition in the freed space.
lvcreate -L 5G -n ns0 debian0 # -n (lv_name) vg_name
lvcreate -L 256M -n ns0-swap debian0
lvscan -v # make sure it’s there
mke2fs /dev/debian0/ns0
tune2fs -j /dev/debian0/ns0
mkswap /dev/debian0/ns0-swap
mount /dev/debian0/ns0 /mnt

Use debootstrap to install a minimal system on the partition. If you have a lot of VMs to make, you could run the following line, then tar up the /mnt directory and untar it in the next partition. Another option is to use the –make-tarball FILE option for debootstrap so you have a tar file of all the .debs locally and then use –unpack-tarball FILE when you do the next VM. Your mirror might thank you.

debootstrap etch /mnt http://debian.osuosl.org/debian # pick a mirror, use lenny instead to get the current stable

On the dom0, make sure (network-script network-bridge) and (vif-script vif-bridge) are uncommented for Lenny.  For Etch, I used (network-script network-dummy).  This gives you simple networking where the physical ethernet device and all of the virtual ethernet devices belonging to the domUs are attached to a virtual bridge once everything below gets configured.

Edit /mnt/etc/network/interfaces.
auto lo
iface lo inet loopback
Edit /mnt/etc/fstab.
proc /proc proc defaults 0 0
/dev/sda1 / ext3 defaults,errors=remount-ro 0 1
/dev/sda2 none swap sw 0 0

Unmount /mnt.
cd /
umount /mnt
Create a config file for the VM (/etc/xen/ns0.cfg).
# -*- mode: python; -*-
kernel = “/boot/vmlinuz-2.6.18-4-xen-amd64″ # put the appropriate kernel entry here
ramdisk = “/boot/initrd.img-2.6.18-4-xen-amd64″
memory = 256
name = “ns0″
vif = [ 'bridge=xenbr0']
disk = ['phy:/dev/debian0/ns0,sda1,w', 'phy:/dev/debian0/ns0-swap,sda2,w']
ip = “192.168.1.10″
netmask = “255.255.255.0″
gateway = “192.168.1.1″
hostname = “ns0″
root = “/dev/sda1 ro”
extra = “console=hvc0 xencons=tty” # console to serial, xencons sends display to vga

Start the domU.
xm create ns0.cfg -c

Log in as root (no password yet!), set a password. base-config is not installed by debootstrap, you have to do the following manually.

Set the timezone.
vi /etc/default/rcS # Set how hardware clock is interpreted, UTC or local
tzconfig

Lenny uses this instead:
dpkg-reconfigure tzdata

Configure networking through the following files (on the domU.)
vi /etc/network/interfaces
auto lo
iface lo inet loopback
# for static ip
auto eth0
iface eth0 inet static
address 192.168.1.10
network 192.168.1.0
netmask 255.255.255.0
broadcast 192.168.1.255
gateway 192.168.1.1

vi /etc/resolv.conf
search some.com
nameserver 192.168.1.2
nameserver 192.168.1.3

vi /etc/hostname
myhostname

vi /etc/hosts
127.0.0.1 localhost myhostname
# the following is for IPv6 support
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


Set the hostname.
hostname myhostname

Make sure your networking is configured correctly.
/etc/init.d/networking restart
ping localhost
ping 192.168.1.10
ping 192.168.1.9 # whatever the dom0 is
ping 192.168.1.50 # some host outside of the box

Edit apt sources.
vi /etc/apt/sources.list
deb http://debian.osuosl.org/debian/ etch main
deb-src http://debian.osuosl.org/debian/ etch main
deb http://security.debian.org/ etch/updates main
deb-src http://security.debian.org/ etch/updates main

Run aptitude update.
Install locales and udev (debootstrap doesn’t install either and you’ll get error messages on a lenny system about no /dev/pty, which are probably harmless but annoying.)

aptitude install locales
dpkg-reconfigure locales
# pick en_US UTF-8
Debootstrap installs a minimal system, run tasksel to install a more complete system.
tasksel install standard

To check out configured volume groups:
vgdisplay

Helpful commands for Xen:
^] # CTRL-] To exit the domU console
xm console domUname # To reconnect to a console
shutdown -h now # To shutdown the VM from within the VM.
xm shutdown domUname # To shutdown from dom0.
xm create /path/to/config -c # Start a virtual machine and attach to its console.
xm destroy domUname # Stop a virtual machine dirty.
xm list # List all running VMs.
xm help

References:
Xen
http://wiki.xensource.com/xenwiki/XenFaq
http://wiki.xensource.com/xenwiki/DebianDomU
http://wiki.xensource.com/xenwiki/XenOnUbuntu64
http://www.debian-administration.org/articles/396

Debian
http://www.debian.org/releases/etch/alpha/apds03.html.en#id2549076
http://www.mail-archive.com/debian-alpha@lists.debian.org/msg24209.html
http://wiki.debian.org/Xen

LVM
http://riseuplabs.org/grimoire//storage/lvm2/#reducing_size

Prepping Sanako clients for Ghost management

Using Ghost to manage lab machines is really straight-forward when the clients are using fairly standard software; however, things get interesting fast when specialized software is required for the computers. The Multimedia Language Center in the World Languages Department has one lab set aside for the use of Tandberg/Sanako language learning software. The vendor did the initial installation around 1999, which involved a complicated hardware remote desktop system integrated with VCR and cassette decks in addition to a file server and a database server. There are 21 clients and one teacher console in the lab. These computers all communicate directly with the file server, which holds mp3 files of phrases spoken in various languages, and the database server, which actually serves as an intermediary between the clients and file server. That requires a little more explanation: the file server is directly accessible from all clients, however, the database server provides a front end to each client to ease searching for files. If students are sophisticated enough, they can skip the db, called Library Pilot. Most, if not all, use the Library Pilot. A recent upgrade to the Sanako software was installed by the vendor, which apparently now utilizes a software-based remote desktop. The new version required the use of Windows XP in order to take advantage of all the software features, so the old Dell GX110s were fitted with more memory to handle the additional operating system requirements. New hardware was purchased to fully update the lab: 22 Dell Optiplex 745s. This provided the opportunity to understand the interdependencies better and switch the lab over from Hard Drive Sheriff (files restored locally) to Ghost.

Having been down this path before, some difficulties were expected. I’ve had issues with Quark, SPSS, and a few other applications. It took a week of testing and resetting, and testing again to determine that the Teacher Console and the clients are the hardest to set up (well, not anymore.) The Teacher Console (TC) has a file that maps MAC addresses to station numbers, which are displayed in a grid in the main Lab 300 window. The client has to report from two applications what its station number is: rclnt and the Duo player. It is possible to call up rclnt -ui and set the station number manually, but this does not propagate to the Duo player. Installing from the TC, as described in the manual, runs through a wizard that asks for the station number and sets both properly. It is even possible to run the set up again after installation to set the station number. That’s no good for managing with Ghost. We want to get an image from one machine and have the rest set themselves automagically.

The keys that need to be set are:

HKLM\SOFTWARE\Sanako\Shared Components\NetCommPlatform\Client\Client ID REG_DWORD 0x0000000y (y = station number in hexadecimal)
HKLM\SOFTWARE\Sanako\Shared Components\Common\ToLabNumber REG_DWORD 0x0000000y
HKLM\SOFTWARE\Sanako\Lab\Lab300\Duo\Common\ToLabNumber REG_DWORD 0x0000000y



I have no experience with Visual Basic Scripting, so my first (and hopefully last, yuck) script is below. It is super simple, would benefit greatly from a little regex and variable substitution, and won’t get any more love because it works. In a generic form, here it is:

set objShell = WScript.CreateObject("WScript.Shell")
station = objShell.RegRead _
("HKLM\SYSTEM\CurrentControlSet\Control\ComputerName\ComputerName\ComputerName")
if station = "Room#-01" Then
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Lab\Lab300\Duo\Common\ToLabNumber", 1, "REG_DWORD"
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Shared Components\Common\ToLabNumber", 1, "REG_DWORD"
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Shared Components\NetCommPlatform\Client\ClientId", 1,"REG_DWORD"
ElseIf station = "Room#-02" Then
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Lab\Lab300\Duo\Common\ToLabNumber", 2, "REG_DWORD"
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Shared Components\Common\ToLabNumber", 2, "REG_DWORD"
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Shared Components\NetCommPlatform\Client\ClientId", 2,"REG_DWORD"
...
ElseIf station = "Room#-21" Then
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Lab\Lab300\Duo\Common\ToLabNumber", 21, "REG_DWORD"
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Shared Components\Common\ToLabNumber", 21, "REG_DWORD"
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Shared Components\NetCommPlatform\Client\ClientId", 21,"REG_DWORD"
End If
objShell.RegWrite "HKLM\SOFTWARE\Sanako\Lab\Lab300\Duo\Path\Default Open Path", "\\someserver\someshare", "REG_SZ"

The client name is pulled from the ComputerName key, then a simple if/else block handles the rest. That’s how I roll when I can’t use a case block. Kidding, kidding. I used vbs because the script interpreter is built in and I thought that as much as I would have preferred to bust out a quick python script, this little task didn’t really justify installing the interpreter. That last line is the value of the default open path that appears when File->Open is selected in the Duo player.

The last thing to do is to add the Group Policy Object snap in to the MMC, open Computer Configuration->Windows Settings->Scripts, double click Startup, then click Add->Browse, find your script (I put it in the folder that Browse starts at), then click Apply. The script will run at each startup, it is small and fast enough that there isn’t really any performance hit on a 3.4GHz box to worry about :) If you are super uptight about those things, then configure it to run once and let it go. Just don’t forget to not run it before capturing an image.

If you are seeing weirdness like all the client requests showing up on the TC as station 1 or messages from the TC to the client going to the wrong client, double check your registry entries.

I have not configured the clients for Library Pilot. The media files are shared read-only and revert back to the top level of the share on each log in. I still have to figure out how to get it to revert after every use in the event that someone does not log out and someone else sits down to work (these are kiosk stations, no individual accounts). The other thing that I don’t have working is the intercom function between the TC and clients, but I suspect that this has to do with the hardware that integrates the VCR and cassette decks. One final note, Dell Optiplex 745s are not compatible with the Duo player because the onboard sound does not have a separate Microphone In. I used the Creative cards that were in the older Dell GX110s and they work fine.

As for firewall configuration, Sanako provides this document. The port numbers, off hand, are 6100, 6101, and 6102. I’ll have to verify these as I don’t have the notes handy.

Debian Etch and Xen

Debian Etch and Xen 3.0.3 on amd64
(based heavily on Debian Sid gets Xen 3.0)
This walk through will get you set up with the tools you need and the hard configuration stuff for getting file-backed virtual xen machines running.  If you want to run partition (disk)-based VMs, I have another page for that but you will still need to get a bunch of the packages listed here as well as setting up the interfaces file.  We will need the following packages:
xen-hypervisor-3.0.3-1-amd64
xen-utils-3.0.3-1
linux-image-2.6.18-3-xen-amd64
bridge-utils
iproute
sysfsutils
xen-tools
So here we go. Start with:
apt-get update
Then install the following (substituting a current kernel and package versions):
root@debian:~# apt-get install xen-hypervisor-3.0.3-1-amd64 xen-utils-3.0.3-1 linux-image-2.6.18-3-xen-amd64
I prefer to run stock kernels, if you want to run a custom kernel you are on your own. Check out the xen-users mailing list archives if you run in to trouble, the topic comes up occasionally. Once apt has done its thing, install the other necessary packages:
root@debian:~# apt-get install bridge-utils iproute sysfsutils xen-tools
The defaults in /etc/xen/xend-config.sxp are fine, we are going to change /etc/network/interfaces to handle networking. Open up your favorite editor and change /etc/network/interfaces, removing the eth0 entry and adding:
auto xenbr0
iface xenbr0 inet static
address 192.168.0.10
netmask 255.255.255.0
network 192.168.0.0
bridge_ports eth0
Reboot the machine and you are ready to use xen-tools. I will refer you to the end of Debian Sid gets Xen 3.0, I would just be copying it here.
The most difficult part in setting this up was the entry in /etc/network/interfaces, as I couldn’t find any documentation on it. I followed at least three separate threads on xen-users related to similar networking issues. A kind soul finally shared the magic and said that the bridge had to be brought up and it would bring up the interfaces attached to it.

Resources
http://www.debian-administration.org/articles/396

Configuring awstats

I recently moved my web server from Apache 1.3.x to 2.2.3 and thought it would be a good time to get awstats configured. It had been installed for a long time and was pretty useless without doing more than ‘aptitude install awstats’. Anyway, reading the docs and the README.Debian file made it seem like it was going to be a huge pain. It wasn’t. I should know by now that there is an inverse relationship between amount of documentation and difficulty.

These notes are specific to Debian, so adjust accordingly. Install awstats:

aptitude install awstats

Set your log files to be output in the combined format:

ErrorLog /var/log/apache2/mydomain_error.log
CustomLog /var/log/apache2/mydomain_access.log combined

I have a number of virtual hosts, so I have that defined for each host within the <virtual> block.

The install script for awstats did its best to figure out what I had and write a default awstats.conf file, but it wasn’t nearly correct. Copy that into files named for your domains, like this:

cp awstats.conf awstats.mydomain.com.conf

Then copy awstats.conf to awstats.model.conf:

cp awstats.conf awstats.model.conf

That file gets ignored and the name reminds you that it’s a template. Edit each of the other .conf files to point to your log files, log type, log format, site domain, host aliases, DNS lookups, etc. There is an optional setting called AllowAccessFromWebToFollowingIPAddresses where you can define who can see your cgi page. I set this to my network range, but I also set it in Apache because I trust that one to be a stronger security measure. CGI gives me the willies. Anyway, the file is heavily commented so check it out and set what you need to. Once you have them set, the docs say to run the awstats.pl script on each one manually the first time:

/usr/lib/cgi-bin/awstats.pl -config=mydomain.com -update > /dev/null

Once you have the config files set, you have to allow the user running awstats to read the log files. By default, that is www-data, which does not have access to the logs. You can change it in /etc/logrotate.d/apache2, so that www-data has group ownership. This may not be the best thing to do and there are some other options which are spelled out in the README.Debian file. The other change you’ll have to make is to the default cron script installed in /etc/cron.d/awstats. It is only set to run on awstats.conf, which won’t give you anything. You can comment out that line, copy it, and adjust it for each of your domains, or you can put this script somewhere and have it parse your awstats config directory:

#!/bin/bash
for conf in `/bin/ls /etc/awstats | /bin/sed 's/^awstats.//' | /bin/sed 's/.conf$//'`
do
if [ $conf = 'model' -o $conf = 'conf.local' ]; then
continue
else
access=${conf%%.*}'_access.log'
if [ -x /usr/lib/cgi-bin/awstats.pl -a -r /var/log/apache2/$access ]; then
/usr/lib/cgi-bin/awstats.pl -config=$conf -update > /dev/null
fi
fi
done

That script is tailored for config files that are named like

awstats.mydomain.com.conf

and log files that are named like

mydomain_access.log

Adjust it to fit your naming scheme. The big benefit is that it’ll examine your directory and run it for each domain configured and you won’t have to remember to add it to the cron script manually.

One last thing (for now, anyway) is a prerotate script in /etc/logrotate.d/apache2. There is a warning in the README.Debian file about losing time in the awstats database due to logs being rotated before the last update of the day is made. Add this to the file

prerotate
/usr/local/bin/awdomlist
endscript

Just above the postrotate entry. I don’t know that it makes any difference, but it seems like a ‘pre-’ ought to go before a ‘post-’.

Copy /usr/share/doc/awstats/examples/apache.conf to /etc/apache2/conf.d/awstats (or whatever) and read the comments in the file. It is set so that anyone (!) can execute stuff in /usr/lib/cgi-bin, so you probably want to limit that using the ‘Allow from’ directive.

Check it out in a web browser at mydomain.com/cgi-bin/awstats.pl and check out the pretty graphs, sure to impress your boss. The docs say the path is mydomain.com/awstats/awstats.pl, but the configuration snippet included for Apache aliases /usr/lib/cgi-bin to /cgi-bin.

Conceptual Integrity and Design

Conceptual integrity is really a component of good design. I have examined works dealing with issues of software design and human computer interfaces published over the past thirty years. The common theme in these publications is that good software comes from good solid design. Here, I will address design as it applies to software, who we are designing for, and some general principles to keep in mind when designing interactive systems.

Some Terms

‘Design’ is an ambiguous term that seems to fill a spectrum with artistic endeavor at one end and planning at the other. So why would a person who deals in absolutes and empiricism want to consider such a nebulous concept? Because without it, any product is going to have to be significantly greater than any similar product to overcome the shortcoming of being poorly planned in implementation and in presentation. It is important to define ‘design’ in the context of software so that it has a real, tangible meaning that people can grasp and use to make better software. So here, design is going to have two meanings: the plan of the implementation and the arrangement of the interface.

Having jumped straight in to the design quagmire, it would be prudent to step back for a moment and define the idea of conceptual integrity. Fred Brooks, Jr. discusses many facets of maintaining conceptual integrity in The Mythical Man-Month (chapters 4-6), and defines it as “‘unity of design.”[1] From Brooks and others who discuss similar ideas, this seems reasonable and will be used as the meaning here.


Design as a Process

Brooks notes that “the difference between poor conceptual designs and good ones may lie in the soundness of design method, the difference between good designs and great ones surely does not.”[2] Design methodologies can be taught and successfully applied to projects, but the best designs rely on more than that. Experience, talent, and inventiveness are required to reach that level. It also requires an understanding that there is a fuzziness to design. Realizing that design is a process, the process is not hierarchical, the process is dynamic, and it leads to the discovery of new goals [3] is a lot to swallow. However, it is a necessary step in considering a project as a whole, rather than a sum of parts: “The Composition Fallacy assumes that the whole is exactly equal to the sum of its parts.”[4] Division of labor in to groups working on distinct parts will result in a collection of parts unless the interactions between the parts are decided upon and well-understood in advance. A cohesiveness can only be achieved if the design is established before work begins. This applies to the architecture as well as the interface.

Shneiderman quotes an author as saying “our instincts and training as engineers encourage us to think logically instead of visually, and this is counterproductive to friendly design.”[5] The problem of “friendly design” will be discussed later, for now the important part is that thinking logically is not enough. Norman proposes that “human behavior is the key”[6] to designing software that works for people and that studying people is superior to reasoning out a solution. He suggests that observing how people perform activities will lead to more natural usage. Where most things are currently structured in a “hardware store approach”, they should be structured in a way that supports how people use objects: hammers next to nails rather than hammers with hammers, nails with nails.[7] This suggests that the design philosophy is due for a paradigm shift, to include behavioral analysis in addition to logical structure.

Another area in which the philosophy may be due for a change is in the interface directly. As in art, where there may exist an implicit communication between the artist and the viewer, the design of a interface is not limited to composition and color. These aspects may be used to establish a dialog between the designer and the person using the software. Norman, after reading up on semiotics, says that once this shift has occurred, the design philosophy will change for the better.[8] He goes on to say that each decision made by the designer is done for “both utility and for communication…in the hands of good designers, the communication is intentional.”[9] Schneiderman echoes this in arguing that the smallest interactions between the system and the person are important considerations, because they happen all the time.[10]

So far, this sounds a lot like things to consider when designing the interface; however, some in the field of Human Computer Interaction, such as Alan Cooper, suggest that the way to get the most cohesive design is to study the people who are to use the software and then make the interface first. The reasoning is that making an interface that works for people at the outset gives the programmers a final goal to work towards that encompasses all of the requirements. In this sense, designing the outward appearance of the software is just as important as the part the person doesn’t see. This isn’t the only approach, of course, but it does make sense to have a distinct plan for how a person is to use the software early on. Otherwise, what would be the point in aiming for a unity of design?


Conceptual Integrity

A carpenter does not care how friendly his hammer is, he wants the handle to fit his hand, to be long enough to provide good leverage without being too long, to have a head of sufficient weight to drive nails with ease, and so on. He doesn’t want it to be friendly, he wants it to be designed for the task. Nelson declares that the “problem is not software ‘friendliness.’ It is conceptual clarity.”[11] Going back to the definition of conceptual integrity as unity of design, it becomes apparent that it comes from a solid plan that takes in to account the goals, constraints, and audience for a particular project. Norman proposes that the “appropriate way to design a complex system is to develop a clear, coherent conceptual model…and to design the system so that the user’s mental model would coincide.”[12]

A good design is not just one in which all of the parts of the system are planned for, but one in which the designer has a clear understanding of the user’s conceptual model of how a task is performed. We have to work with what we have, not what we wish to be. Or, as Nelson put it, “If the button is not shaped like the thought, the thought will end up shaped like the button.”[13] People use software all the time, changing their tasks to work as the software designer envisioned their task. This is a result of the designer(s) missing the mark, either not understanding the task domain completely, reasoning out a solution, or not spending lots and lots of time designing the product before embarking on implementation (which probably encompasses the other two reasons as well.)

Brooks states that it is better to leave things out in order to maintain “one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.”[14] On the face of it, it sounds reasonable; however, it begs the question, who gets to decide?


The Benevolent Dictator

If unity of design is a major goal in a project, then design must be the path to achieving it. Someone has to decide what the purpose of the project is, how it is to be implemented (at least at a high-level), what it will look like, the intended audience, and so on. This requires a solid understanding of the requirements, constraints, and vision of the final product. Whether the lead realizes it or not, these are all principles of design, whatever the end product happens to be. Norman and others have advocated that the lead be a benevolent dictator in order to enforce the design plan and settle disputes once and for all.

In two of Brooks’ essays in The Mythical Man-Month, he states that conceptual integrity can only be achieved if the design comes from one person, or a small group of people in agreement[15][16]. Nelson flatly states that there must be “a chief designer with dictatorial powers….”[17] This avoids the concessions and compromises that come about in design by committee approaches while maintaining the focused vision of the product.


Know the User

“‘Know the user’ was the first principle in Hansen’s (1971) list of user engineering principles.”[18] There is no way that a product can be a success without knowing who the person on the far end of the development cycle is. In order to communicate effectively with the person on the other end, it is critical to know who that person is and what their goals are. This person is you and I and the lady down the street in a general sense: people have certain characteristics that are fairly consistent across a broad spectrum. Things such as physical abilities and limitations, emotional reactions, modes of thinking, mental constructs and the like. Jef Raskin uses a lot of research in The Humane Interface along those lines, and the HCI community is fond of quoting things such as Fitts’ Law to explain why something should be placed in a particular location or given a particular size. The designer has to be aware of these issues and familiar enough to know whether the interface designers are fulfilling design goals or not.

People find it difficult to retain too many things in short term memory at the same time. “Seven plus or minus two chunks”[19] seems to be a good rule of thumb. If people have to be given information, it is best to not overload them with lots of dialogs, warnings, notices, etc. Taking advantage of semantic knowledge of computer concepts, which are linked to already familiar ideas, helps people by allowing them to use long term memory constructs.[20] “Semantic knowledge is conveyed by showing examples of use, offering a general theory or pattern, relating the concepts to previous knowledge by analogy, describing a concrete or abstract model, and by indicating examples of incorrect use.”[21] If some tasks in the program can be related to tasks the person performs in other domains, then much work has been done for you.

Falling in to the ‘friendly’ software mode can be more harmful than helpful. “Attributions of intelligence, independent activity, free will, or knowledge to computers can deceive, confuse, and mislead users.”[22] People may use the same semantic knowledge mentioned before to misconstrue the software (or the hardware) as having some level of intelligence if the responses are similar to what a person might say. Helpful messages are useful, friendly ones are counterproductive.

People will “respond to design, both good and bad, in appropriate manners.”[23] They will have a response, either way, so pretending to be the user and running through tasks may be an easy way to flush out flaws or remind the designer what people are trying to accomplish.

Errors are common, probably one of the most common things people do which makes it extremely important that errors are handled gracefully. Norman says errors can be avoided by organizing according to function, making choices distinctive, and making it hard to do something irreversibly.[24] Gilb and Weinberg suggest a three part design to guard against errors:

a. Select natural sequences.
b. Specify error recognition, handling, and recording.
c. If b is too hard, redesign the codes, adding redundancy to make them distinguishable from other codes in the sequence.[25]

Increasing speed and reducing errors may be accomplished with the use of defaults. Using defaults can be expected to reduce errors based on the “idea that what you don’t do you can’t do wrong.”[26] Manual entry should be a last resort, with defaults or a list of possibilities provided being preferred.[27]

Shortcuts may also be used to allow adept users a way of working faster, while providing a means for beginning users a way to advance gracefully. Typeahead and listing shortcut keys next to menu items are two methods of achieving this.[28] Making the system adaptive to frequently selected items is another means of increasing speed.

Using a subset of a natural language is another method of designing with the person’s semantic knowledge in mind. One project “tried to copy English grammar closely…did not allow the meaningful reordering of phrases permitted in English, such as ‘Into A, copy B’.”[29] It would be nearly impossible to account for every variation permitted in a natural language, but something as simple as ‘copy A into B’ easy for the designer to account for and the user to grasp.

Data displays should be consistent, allow efficient information assimilation, require a minimal memory load, and be flexible in data display.[30] They should take advantage of the persons semantic knowledge in order to put elements of the software in to long term memory quickly. And “the importance of long range user feedback in maintaining a system cannot be underestimated.”[31]


Methods of Approach

As knowledge in a field grows, the natural progression is for assumptions to be questioned and revised, methods changed to reflect new discoveries, and incorrect ideas thrown out. “There is an implicit assumption in performing human factors work that the systems we have already designed are somewhat flawed.”[32] This assumption is not a negative, but a realistic approach that accepts that better products are the result of examining that which has come before for what works and what does not.

Norman says that the process of evaluating human needs, field studies, and observations should be done outside of the product process.[33] His reasoning is that it is too late, from a business sense, to spend the time doing these functions when a project has already started. It costs money and delays the rest of the development team. Rather than stand in the way, the HCI research should be done when researching products to pursue. In this work flow, the development team can get straight to work on a product, with HCI people working alongside. This runs counter to Alan Cooper’s process of performing the analysis at the outset of a project (see About Face 2.0).

A four level approach for systems design has been proposed. From highest level to lowest these levels are:

  • Conceptual model
  • Semantic model
  • Syntax level
  • Lexical level [34]

These correspond to the big idea behind the project, the meanings of input and output, specific commands, and finally, hardware or device dependencies. The approach breaks the project in to levels of detail, tackling each as necessary. It follows a process that uses as high a level of notation as possible at each step, “exposing the concepts and concealing the details until further refinement becomes necessary.”[35]

Writing formal specifications can be an extremely powerful tool. Putting design decisions to paper, or text file for that matter, forces the project lead to examine decisions as they are written and to make sure that the various elements of the design are consistent.

Once the design is formalized, it is time to focus on the programming. Nelson simplified some good programming practices in to three things:

  • Don’t be afraid to start over.
  • Design long, program short.
  • When you are sure the design is right, code it.[36]

The widespread use of object-oriented programming has developers working in terms of, well, objects. This process involves mapping new data types with their operations to things in the real world. Naturally, a good understanding of the task domain results in better objects. Brooks notes, however, that instead of teaching OOP as a type of design, it has been taught as a tool.[37] Some developers have gone against this trend and built programs where the users act directly on objects, but this seems to be a small minority. With a proper use of OOP and a solid design, the task domain objects are better realized in the software. This would create products with a much stronger unity of design.

Having a strong design that utilizes object flow analysis, redesigns can be performed before or after the system is used.[38] The iterative process tests design assertions, takes in user feedback, and reveals structural flaws. These things cannot be avoided, but can be minimized by spending a lot of time with the design at the start and using an approach that allows for changes easily.

Ultimately, the goal is to move complexity away from the user. Reducing what the person has to do makes the product more complex, [39] but that complexity has to be dealt with far fewer times and by far fewer people. Not addressing the simplification of the interface does not save work, it puts the difficulty in other areas, namely those of the user.[40]

In Directions in Human Factors for Interactive Systems, the authors proposed ten hypotheses:

  1. The inclusion of features not needed for a task interferes with task performance.
  2. The implementation of features unknown to the user interferes with task performance.
  3. Command systems should not be layered of hierarchical.
  4. Error messages should have a positive emotional tone.
  5. The user should be alerted to any potentially damaging action.
  6. Error correction should be easy and immediate.
  7. Abbreviation rules should be consistent and simple.
  8. First-letter abbreviation of command words is a superior abbreviation scheme.
  9. Command languages should be based on legitimate English phrases composed of familiar, descriptive words.
  10. Commands should be described with examples rather than in generalized form.[41]

It should be pointed out that command languages are not dead by any means. In fact, Norman argues that search engine queries are really command languages that tolerate variation and allow for some natural language variations.[42] When they fail, they tend to fail gracefully, asking for confirmation or suggesting alternatives when the command/query is not valid.

In general terms, it is important to use familiar terms and be consistent, elements need to be distinct enough from one another to avoid confusion, phrasing should be succinct, and action words prominent.[43] This is true of menu items, error messages, screen layout, and every other piece that a person encounters when using the product.

When evaluating the success of a product, “user-friendliness” is not helpful. Some measurable, meaningful criteria are time to learn, speed of performance, rate of errors, subjective satisfaction, and retention over time.[44] User satisfaction is valid performance criteria, despite its subjectivity. A “like it” or “hate it” forecasts how likely it is for the users to invest in a new product and make it successfully or scorn it altogether.

Conclusion

Avoid missing ball for high score.[45]

Those six words were the manual for Pong. An extremely simple set of instructions for an extremely simple game. The idea of conceptual integrity has been around for some time, Ted Nelson declaring conceptual simplicity a new frontier in 1974[46], Brooks wrote in 1975 that it was the “most important factor in ease of use”[47], and, well, Pong has been around forever.

The body of work in this field is huge. Here, I have only scratched the surface in discussing design in the software field, putting people’s needs at the fore in interactive systems, and some principles to accomplish that goal. To summarize, a successful project is going to depend on a lot of time spent on planning out the system, having someone lead with absolute authority, and, most importantly, focus on people.

So, the idea is not new. There must be some distractions that have focused people on tackling the wrong issues. This really goes beyond the scope of this paper and would be an interesting study on its own: what methods do projects use when designing software?

End Notes
[1] Brooks, The Mythical Man-Month, p 44
[2] Brooks, The Mythical Man-Month, p 202
[3] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 391
[4] Gilb, T. and Weinberg, G., Humanized Input: Techniques for Reliable Keyed Input, p32-3
[5] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 198
[6] Norman, D., Simplicity Is Highly Overrated, http://www.jnd.org/dn.mss/simplicity_is_highly.html
[7] Norman, D., Logic versus Usage: The Case for Activity-Centered Design, http://www.jnd.org/dn.mss/logic_versus_usage_t.html
[8] Norman, D., Design as Communication, http://www.jnd.org/dn.mss/design_as_comun.html
[9] Norman, D., Design as Communication, http://www.jnd.org/dn.mss/design_as_comun.html
[10] Ledgard, H., Singer, A., and Whiteside, J., Directions in Human Factors for Interactive Systems, p 19
[11] Nelson, Ted, Computer Lib/Dream Machines,p 25
[12] Norman, D., Design as Communication, http://www.jnd.org/dn.mss/design_as_comun.html
[13] Nelson, Ted, Computer Lib/Dream Machines,p 12
[14] Brooks, The Mythical Man-Month, p 42
[15] Brooks, The Mythical Man-Month, p 44
[16] Brooks, The Mythical Man-Month, p 35
[17] Nelson, Ted, Computer Lib/Dream Machines,p 72
[18] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 53
[19] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 275
[20] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 50
[21] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 49
[22] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 322
[23] Gilb, T. and Weinberg, G., Humanized Input: Techniques for Reliable Keyed Input, p 5
[24] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 63
[25] Gilb, T. and Weinberg, G., Humanized Input: Techniques for Reliable Keyed Input, p 77
[26] Gilb, T. and Weinberg, G., Humanized Input: Techniques for Reliable Keyed Input, p 26
[27] Mehlmann, Marilyn, When People Use Computers: An Approach to Developing an Interface, p 35
[28] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 109
[29] Ledgard, H., Singer, A., and Whiteside, J., Directions in Human Factors for Interactive Systems, p 41
[30] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 69
[31] Ledgard, H., Singer, A., and Whiteside, J., Directions in Human Factors for Interactive Systems, p 47
[32] Ledgard, H., Singer, A., and Whiteside, J., Directions in Human Factors for Interactive Systems, p 141
[33] Norman, D., Why doing user observations first is wrong, http://www.jnd.org/dn.mss/why_doing_user_obser.html
[34] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 46
[35] Brooks, The Mythical Man-Month, p 143
[36] Nelson, Ted, Computer Lib/Dream Machines,p 41
[37] Brooks, The Mythical Man-Month, p 221
[38] Mehlmann, Marilyn, When People Use Computers: An Approach to Developing an Interface, p 29
[39] Mehlmann, Marilyn, When People Use Computers: An Approach to Developing an Interface, p 45
[40] Gilb, T. and Weinberg, G., Humanized Input: Techniques for Reliable Keyed Input, p 180
[41] Ledgard, H., Singer, A., and Whiteside, J., Directions in Human Factors for Interactive Systems, p 148
[42] Norman, D., UI Breakthrough-Command Line Interfaces, http://www.jnd.org/dn.mss/ui_breakthroughcomma.html
[43] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 113
[44] Schneiderman, B., Designing the User Interface: Strategies for Effective Human Computer Interaction, p 73
[45] Nelson, Ted, Computer Lib/Dream Machines,p 36
[46] Nelson, Ted, Computer Lib/Dream Machines,p 12
[47] Brooks, The Mythical Man-Month, p 255

Interface design bibliography

Brooks, Jr., F., The Mythical Man-Month, Boston, Addison Wesley Longman, Inc., 1995

A collection of essays wherein the author challenges assumptions in software engineering based on his experience and observations. This edition contains reflections twenty years after the original was published, allowing the author to reflect on his original assertions.


Gilb, Tom and Gerald M. Weinberg, Humanized Input: Techniques For Reliable Keyed Input, Reprint ed., QED Information Sciences, Inc., 1984

Major sections of the book include how inputs are designed, default
messages, positional messages, and adaptive checking. The over-arching theme is to take operators in to account when designing systems in order to improve working conditions and productivity.


Ledgard, H., Singer, A., and Whiteside, J., Directions in Human Factors for Interactive Systems, Berlin, Springer-Verlog, 1981

Using a system designed by the authors as a case study, a process for growing software to work with people is critiqued. The authors discuss in detail design decisions, what worked, and perhaps most importantly, what didn’t and why.


Mehlmann, Marilyn, When People Use Computers: An Approach to Developing an Interface, Englewood Cliffs, Prentice-Hall, Inc., 1981

‘Things to think about when developing terminal-based applications,’ may have been a better title, although I’m probably showing my bias. The work does not go in to the detail of Humanized Input, but is more of an overview or quick reference for the developer.


Nelson, Ted, Computer Lib/Dream Machines, Revised ed., Redmond, Tempus Books of Microsoft Press, 1987

This two-book book covers the big topics in the field of computers: programming languages, major corporations, enthusiasts, uses, history, and (most importantly) the impact on the everyday person. Dream Machines focuses on interactive uses of computers, such as games, movies, interfaces and the like.


Schneiderman, B., Designing the User Interface: Strategies for Effective Human-Computer Interaction, Reading, Addison-Wesley Publishing Company, 1987

Extremely dense work, more a collection of essays than a cohesive book. Lots of research is referenced in exploring high-level concepts, good practices in designing software for people, and the constraints people encounter physically and mentally.


Norman, D., UI Breakthrough-Command Line Interfaces, http://www.jnd.org/dn.mss/ui_breakthroughcomma.html, 2007

Command line interfaces were viewed as too complicated for novice computer users, so the graphical user interface gained prominence. Search engine commands, the author argues, are really a return to CLIs that are more flexible than the GUI and even more flexible than their predecessors.


Norman, D., Design as Communication, http://www.jnd.org/dn.mss/design_as_comun.html, 2004

An argument for a shift in thinking about design: that it is really a communication between the designer and the person using the software, with the technology as the medium. In this sense, it becomes necessary to explain why things are, not just whether they can be used or not. This helps to build understanding through conceptual models.


Norman, D., Logic versus Usage: The Case for Activity-Centered Design, http://www.jnd.org/dn.mss/logic_versus_usage_t.html, 2006

The problem with using logic to define interactions is that people don’t act logically. It is better to design around the concept of activities and grouping objects based on their relation to each other in activity as well as providing some logical structure for things that fall outside of that model.


Norman, D., Simplicity Is Highly Overrated, http://www.jnd.org/dn.mss/simplicity_is_highly.html, 2007

An argument against creating things as we think they should be and designing for how they really are. Many examples of complex objects chosen over simplified ones based on decisions completely outside of work flows, ease of use, and the like.


Norman, D., Why doing user observations first is wrong, http://www.jnd.org/dn.mss/why_doing_user_obser.html, 2006

It is too late to do user observations once the project has started. The research should be done before deciding on projects to pursue so that viable endeavors can be decided on before they start. Once a project is rolling, the rest of the team is held up by the interface designers doing research.

Using sftp to upload web pages

1. ftp and Why You Shouldn’t Use It
First, it would probably be helpful to define ftp and sftp. ftp is the File Transfer Protocol, used for downloading and uploading files to servers. sftp does the same thing, only it uses an encrypted channel between your computer and the server you are connecting to. Why is the encryption a big deal? Well, ftp sends your user name, password, and data in clear text which is A Bad Thing. You may not care who snoops your data (it’s going to be published on the web anyway for the whole world to see, right?), but you should protect your login information to prevent somebody doing malicious things to your data (or causing problems on my server).
2. sftp and Why You Should Use It
Well, I guess the basic question is answered above. What is more to the point, you have to use it. The new web server will not provide ftp service, for the reasons stated above. It does provide sftp, and the other tools associated with ssh (the Secure Shell).
3. The problem With sftp
Because there is always a catch, right? The problem actually lies with some web authoring programs, namely Frontpage and older versions of Dreamweaver do not support sftp. Dreamweaver MX 2004 and later versions support direct uploads using sftp.
4. Getting Around the Problem
It’s not really a problem with Dreamweaver MX 2004, you just need to change your ftp settings to sftp and point it to your directory on the new web server.
If you are using one of the other programs, it means an extra step. Use a program such as CoreFTP if you are a Windows user or Cyberduck if you are a Mac OS X user to transfer your local web directory to your server web directory. Both programs are easy to set up and use. It’s well worth the hassle to ensure that your pages don’t get defaced, replaced, or erased.
Mac users have another option, Fugu. It wraps up sftp, ssh, and scp in a nice graphical interface. A good write up is available at NewsForge.
5. One More Issue
To access your account on the web server, you will need to either be on campus or use the VPN client (available for download on campus).
Note: This is no longer true. All you need for our web server is an sftp client. A number of people have had trouble using the VPN client reliably from off-campus. Since ssh encrypts traffic, we don’t really need the VPN for that. What the VPN is useful for is allowing us to block remote log in attempts by Those Who Would Do Bad Things. Using strong passwords and protecting accounts will help combat that.

Executing programs at system startup

Some things need to be run before a user log in.  Could be a check for updates on an imaging server, clean up of files, a splash screen with your logo on it, whatever.  Here’s how to do it for Windows, OS X, and Gnu/Linux.

Gnu/Linux

Copy your shell script to the /etc/init.d directory.  Each distribution does things a little differently, so check out one of the other scripts in that directory to see what options are typical, i.e. start, stop, restart, and how to implement them.  It may be easier to copy an existing one and then just have it call your script instead of whatever it is supposed to call.  If you don’t need that functionality, don’t bother.  No sense in making work for yourself.
Next, go to the /etc/rcS.d directory and create a symbolic link to the script you put in /etc/init.d.  Name the link S##yourscriptname, where ## is a number representing the order in which it will be run.  For example, if your script depends on some nfs share, then you would use a number higher than the mountnfs script.
Just remember to do all the stuff you normally would do:  make the script executable, change the ownership to root if necessary, etc.  Oh, and test it first, then test it again.  You don’t want to mess up the boot sequence and then have to boot in to single user and fix it.

OS X

You will need to be root to do many of these steps.  If you are VERY careful, you can type ‘sudo -s’ in Terminal, enter your admin password and you’ve got yourself a root shell.  If you aren’t familiar with running as root, search around the web for all the bad things that can happen, then proceed with caution.  You’ve been warned, so don’t write to me if you hose your system.  Copy a folder from /System/Library/StartupItems to /Library/StartupItems.  Apple-installed stuff goes in the /System/Library, all of your administrative stuff should go in /Library.  Pick one that is similar to what your script will do, i.e., if your script uses some network stuff, copy something that uses network stuff.  Change the name of the folder to the name of the script you intend to run from here.  If there is a Resources folder, remove all of the language folders except for the ones you intend to use.  In the Localizable.strings folder and change the key and string that begin with “Starting something” to “Starting yourscript”.  Edit the StartupParameters.plist file so that the values make sense for your script.  I know that this is a vague statement, but your script may differ greatly from whatever example I may provide.  You’ll have to make changes to the Description, Provides, Requires, Uses, OrderPreference, and Messages.  It might be an XML file or a text file, but the keys are the same.  Edit and rename the existing script in the file or copy in your own.  Make sure it’s executable and owned by root.  If you set the group ownership to admin and give group execute privileges, then your admin users can run the script while the system is running.

Windows

Windows startup and shutdown scripts are run after networking has been established and before networking is shut down, respectively.  Place your script in the appropriate folder, c:\\WINDOWS\system32\Group Policy\Machine\Scripts\Shutdown|Startup.  Run gpedit.msc from the Run prompt, expand Computer Configuration -> Windows Settings, then click on Scripts (Startup/Shutdown).  Double click on the one you want to add your script to, click add in the dialog box that appears, then browse to your script.

Resources

http://www.microsoft.com/technet/archive/community/columns/tips/2kscript.mspx?mfr=true
http://www.kernelthread.com/mac/osx/arch_startup.html
http://www.osxfaq.com/Tutorials/LearningCenter/HowTo/Startup/index.ws
http://www.linuxjournal.com/article/7393

rsync backup script sample

#!/bin/bash
# Adapted from tridge@linuxcare.com, rsync.samba.org/examples.html
# This script does personal backups to a box using rsync over ssh.
# It does 7 day rotating incrementals.  The incrementals will go into sub
# directories named after the day of the week, and the current full backup
# goes in to a directory called ‘current’
# directory to backup
BDIR=/home/$USER
# name of the backup machine
BSERVER=your.backup.server
# excludes file contains a wildcard pattern per line of files to exclude
EXCLUDES=$HOME/path/to/excludes
# ssh options
#SSHI=’”ssh -i $BDIR/.ssh/batchkey”‘
BACKUPDIR=`date +%A`
OPTS=”–force –ignore-errors –delete-excluded –exclude-from=$EXCLUDES –delete –backup –backup-dir=$BACKUPDIR -a”
export PATH=$PATH:/bin:/usr/bin:/usr/local/bin
[ -d $HOME/emptydir ] || mkdir $HOME/emptydir
rsync –delete -a -e “ssh -i $BDIR/.ssh/batchkey” $HOME/emptydir/ $USER@$BSERVER:~/current/$BACKUPDIR/
rmdir $HOME/emptydir
# now the actual transfer
rsync $OPTS -e “ssh -i $BDIR/.ssh/batchkey” $BDIR $USER@$BSERVER:$BDIR/current
The excludes file:
*.mp3
*~
Temporary\ Internet\ Files
*.ogg
*.iso

Socket programming in Python

Python and why it might be worth taking a look at
Python is an interpreted language, like perl or php. Code is written and saved in a file, made executable like a shell script (or not), then run by executing the file or invoking the interpreter with the file name as an argument. It runs on just about every OS in use and is extremely portable. The biggest advantage, however, is that the syntax is simple and results in code that is almost self-commenting. As for drawbacks, it will definitely run slower than C or C++, so if speed is absolutely critical then it’s probably not for your project. However, I have it on good authority that some big companies tend to embed the python interpreter in their C software in case they need to quickly extend their project. Check out this article for the perspective of someone versed in many languages, as opposed to a lazy sys admin who use bash and python and silently ignores his C++ books.
On to sockets
Python supports two types of sockets, UNIX (AF_UNIX) and Internet (AF_INET), and two types of protocols, TCP (SOCK_STREAM) and UDP (SOCK_DGRAM). Unix sockets are used for interprocess communication and the rest you’ve probably worked out already. The main module used in socket programming is socket and the function within this module used to create sockets is called, you guessed it, socket(). The socket function has the syntax:
socket(socket_family, socket_type, protocol=0)
So creating a TCP/IP socket might look like this:
socket.socket(socket.AF_INET, socket.SOCK_STREAM)
And a UDP socket like this:
socket.socket(socket.AF_INET, socket.SOCK_DGRAM)


Links
http://python.org/ Python homepage

Subversion the Debian way

Why use Subversion and why use Debian? Well, lots of people with a lot more knowledge about either or both have answered these questions, so I’ll leave the big arguments to them. Check the links below for some light reading on what some had to say. Why are we using them for CS 594? That one I have the answers to:
  1. As a group, we decided that Subversion (svn) is the revision control system on the rise and we want to use the latest and greatest (sorry cvs).
  2. Debian is the linux distribution of choice because I got stuck with the job of sys admin for the quarter and it happens to be my favorite. 15,490 packages a simple apt-get install away, automatic dependency resolution, debconf, ultra stable, blah blah blah. You get the idea.
OK, great. We are using svn and Debian stable. Now what? You might find it useful one day to set up your own repository or take Debian for a spin, so I’ll document the things I did to set up our svn server.
It goes a little like this:
    Grab the installation media Run through the installer Lock down the box Install subversion Do a little administrative stuff
Grab the installation media
Point your browser to http://www.us.debian.org/distrib/ and find the installation media that is right for you. Debian supports a lot of different architectures and provides four different distributions. For our purposes, we want to run Sarge, the current stable distribution, on i386 style equipment. Also, since we are only installing a bare minimum of packages, it is not necessary to download the full cd. Grab the netinstall image from http://www.us.debian.org/CD/netinst/ or pick one of the other low bandwidth options. Once you have the .iso downloaded, burn it to cd as a bootable image. I cheat and use Disk Utility on OS X because my G5 has more disk space than I know what to do with and, well, Disk Utility makes it brainless.
Run through the installer
Debian used to get picked on for it’s unfriendly installer. It still does, but it is not as hard as it used to be and will actually do most of the work for you. It’s not pretty like CentOS, but it does the job and let’s you get your hands dirty if that’s what you want. Really, the only part that may get sticky is partitioning the disk. Guided partitioning will give you a pretty good set up with most of the space allocated to /home, but that didn’t jibe with the plans I had for putting the repositories in /var, so manual partitioning is called for. You aren’t stuck with fdisk anymore, the curses-type interface lets you click and select what you want to do.
Anyway, answer the questions about networking, host name, etc. When you get to the final step of installing additional packages, select manual package selection, but don’t run dpkg. There is an option to install packages later, and that’s what we want. At this point, the installer is going to download a bunch of packages and install them, then give you a login prompt.
Lock down the box
From a security standpoint, it is better to do the complete install disconnected from your network, lock down the box, then connect it to the network. We didn’t do that, but we better do something before we start running any services. If you were watching the packages scroll down the screen, you noticed ssh was installed and you didn’t even have to ask for it. Better cd to /etc/hosts.deny and put an entry in like ALL: ALL. Then disable root log in for ssh in the file /etc/ssh/sshd_config. I want to know what’s going on with my system, so now is the time to download chkrootkit, tiger, Bastille, and scanlog. Pretty easy to guess what chkrootkit does. Tiger is a security auditing system, Bastille is a system hardening program, and scanlog alerts you to nasties portscanning your server. Nmap is a good idea, too, but you ought to run it locally and from another host so you can see what your local system sees and what the outside world sees.
Once you think things are OK, edit /etc/hosts.allow and add an entry for ssh access.
Install subversion
apt-get install subversion subversion-tools
Yep, really is that simple.
Do a little administrative stuff
Now that you have everything above done, people will need to use it. I’m going to create a directory called /var/local/repository and it will contain a separate repository for each project. You have some choices about how to handle this, so go read up about it in the subversion documents. One of svn’s big advantages is the ability to use Apache 2.0 as the connection and authentication mechanism, but I don’t want to run a web server just for this. We have to run ssh to administer the box, so we’ll use ssh as the connection mechanism and leave only one service open to the world. svn will tunnel through ssh, so all we need to do is create user accounts, an svn system user and group, add the users to that group, change the ownership of /var/local/repository to svn, change the mode and set GID on the repository, and then start using it.
Almost done
So what does a client do to connect to the repository? Create a local folder that will hold the local copy of the repository. Connect to the server with a command like
svn checkout svn+ssh://host.example.com/repository/project1
from within your local repository folder and you’re set.
Links
http://www.us.debian.org/ Home page for the Debian project
http://www.us.debian.org/distrib/ Debian distributions page
http://subversion.tigris.org/ Subversion project home page
http://svnbook.red-bean.com/en/1.1/index.html Online documentation for subversion 1.1
To Do
Need to add more examples to Almost Done, it’s really just a place holder at the moment. A page with common commands would be useful. Oooh, some pretty pictures would make this a bit more exciting, too.