Saturday, January 23, 2010

Consent for my own software to look at my data???

Here in the US (and I presume elsewhere) the annual rite of passage of doing one's taxes is upon use yet again. Once you've done something crazy like getting married, having kids or buying a house, the whole process gets more and more complicated as you try to minimize the amount of taxes you owe Uncle Sam.

I've always done my own taxes and for the past 10 or 15 years, I've used Intuit's Turbo Tax software to do so. I still can't bring myself to do the taxes online -- I can't help feeling that there's just something wrong about not keeping that data in house.

Anyway, yesterday I installed TurboTax to start working on my 2009 taxes (yeah, I'm a bit early, but I like to do it piecemeal as I receive tax reports and have spare cycles here and there).

Following the installation, I was prompted with the following consent screen:

If you look carefully, essentially the screen is asking if you give consent to release data to Intuit so that they can figure out whether they should offer you extra paid tax preparation services (paid out of your return) and/or a chance for you to get some portion of your return on a debit card. So they want to use the information on your return to market additional services to you.

What isn't clear to me from the disclosure is: are you actually giving the information to Intuit (in other words, is it being transferred to an Intuit server off-system) or is the consent is about the software that you've purchased and installed on your computer looking at the data locally.

If the former, then I think that they should explicitly say that the data is being transferred to an Intuit server as that isn't clear in the disclosure.

If the latter, why the heck is that necessary. It's my software that I purchased and it's keeping the data locally on my system. Intuit never sees the data unless I specifically send it to them for one reason or another. If this is really required by law, how does that match up with the "instant refund" or "refund anticipation loans" offered by the likes of H&R Block or Jackson Hewitt? Do they have to get you to sign a similar consent form before they can "notice" that you're getting a loan and offer to give it to you instantly (at great cost to you, of course)?

Even if it is the crazy latter situation, the consent should clearly state that the data is not leaving the system, that it is only being used by the software I just installed.

In any case, I did not consent to any data release... I'll wait for my money to show up when it shows up (assuming I even get a refund, which isn't always the case).

Tags : / / /

Sunday, January 10, 2010

Setting up a new ubuntu server

I've been running my own mail server for close to 20 years... Through years, I've gone from Interactive Unix (how many of you remember that one!) to Red Hat Linux to Fedora Linux and now I'm moving to Ubunto (in part thanks to the strong recommendations I've gotten from friends, especially Pat Patterson.

I host several services on my server and because we're at the end of a relatively slow pipe, I use a dedicated server hosted at Superb Hosting. I use a dedicated server rather than the more typical web hosting or shared hosting because it gives me better control over my services and because I host a bunch of email domains for friends (some of which I simply forward to their current ISP and some who actually get their mail from my system.

So, I needed to setup the following services on my server:

  • DNS services for the 12 or so domans I manage (2 of my own and the rest friends & family).
  • Web server for my personal site.
  • Email services for something like 12 domains as well.

Sounds simple, doesn't it?

Well, it wasn't that simple, but mostly becuase a) I was learning new ways that things are done on Ubuntu vs Fedora, b) the tweaks of how I wnat to do things typically involves manual configuration changes that aren't always easily discerned from reading the man pages, and c) I like to understand the why as well as the how when doing administrative stuff so I spend a lot of time reading/reviewing/searching as I make my changes.

BTW - I'm not only willing, but actually want to do this so that I can keep my hands a bit dirty (and maintain some basic understanding of the current technologies used on servers). At work, they keep my grubby little hands far away from the system adminstartion side of the houose.

Anyway, I thougt it would be useful to document what I went through as I setup the server as it may help others trying to do the same things.

One note about the commands shown below: I logged in as root to do the configuration, so you don't see "sudo (command)" for all of the various commands. Some would say this might be a more dangerous way to configure the system and I would agree for onsey twosey administrative commands. However, for a long term session where you're doing nothing other than administrative commands, sudo just gets in the way. And yes, you need to be careful when you're logged in as root.

The following sections are presented below


OS Updates

First step with any new system is to ensure that I have the latest and greatest software installed -- this is expecially important on an internet visible server.

This involved running the following commands:

apt-get update         # to update the apt-get configuration/data files
apt-get dist-upgrade   # to upgrade all insalled packages to latest versions

This made sure I had the latest patches for this release of the OS. However, I wanted also to make sure I had the latest OS version. For Ubuntu, they have two development lines for servers: a somewhat frequently changing/evolving line and a more stable Long Term Support (LTS) line. Both lines get security patches regularly but LTS gets them for several years longer while the fast changing line will more frequently require you to upgrade to the latest OS version for patches.

Given what I do with the server, using the LTS line is the right thing for me to do (which is the version that was installed by my provider). So I ran the follwing commands to ensure I had the latest version:

apt-get install update-manager-core
do-release-upgrade

WHich reported that there was "No new release found" which is correct as 8.04LTS is the latest LTS.

If, on the other hand, I wanted the latest OS rev (not just the latest LTS OS rev), I could have edited the file:

/etc/update-manager/release-upgrades

and changed the line "Prompt=lts" to "Prompt=normal"


Miscellaneous Tools

As I went throught the isntallation and setup, I found a number of tools were missing that I had to install to do the things I wanted to do, so I'll list them here...

  1. System V Configuration files

    I like to use the System V commands for managing the system (including the service command to start/stop init.d services).

    apt-get install sysvconfig
  2. Make

    I use a lot of Makefiles for managing the build and installation of software and packages. I was a bit suprised that my server didn't include that by default, but I presume that was because it is a server and doesn't have the development system installed either.

    apt-get install make

Firewall

First thing to do is get the firewall up and running. While I plan to tightly control which services are exposed on which ports, I still feel much more comfortable having an explisit list of ports which are accessible from the internet at large. I also like to setup and test services locally while the are still blocked (including only opening up access from my own systems so I can even do remote testing without worrying about others getting into the server while it is a work-in-progress.

I use an iptables based firewall that is manually configured for the system. I've been using pretty much the same setup for years though I continuously tweak it. The script is written as an init.d service script so that I can install it there and have it automatically run it at system startup.

In addition to the typicall port protections, I also keep a blacklist of IPs for which I block all access to my server. Systems get on this list when I see that they are trying to hack into my system via repeated SSH login attempts.

The core iptables rules in the script include:

#
# Create a new chain named "filter" and "OFilter"
#
iptables -N filter                # add the new chain

#
# allow established connections
#
iptables -A filter -m state --state ESTABLISHED,RELATED -j ACCEPT

#
# if there are any ports to be dropped
#
if [ -f "${FILE_DroppedPorts}" ]; then
  grep -v "^#" "${FILE_DroppedPorts}" | while  read proto port
  do
      #
      # for non-blank lines
      #
      if [ x${proto} != x ]; then
          iptables -A filter -i eth0 -p ${proto} --dport ${port} -j DROP
      fi
  done
fi

#
# if there are any blocked IPs
#
if [ -f "${FILE_BlockedIPs}" ]; then
  grep -v "^#" "${FILE_BlockedIPs}" | while  read ip
  do
      if [ x${ip} != x ]; then
          iptables -A filter -s ${ip} -j LOG
          iptables -A filter -s ${ip} -j DROP
      fi
  done
fi

#
# allow ssh to this host from anywhere
#
iptables -A filter -p tcp --dport ssh -j ACCEPT

#
# allow HTTP/HTTPS to this host
#
iptables -A filter -i eth0 -p tcp --dport http  -j ACCEPT
iptables -A filter -i eth0 -p tcp --dport https -j ACCEPT

#
# allow SMTP, SMTPS and SMTP/TLS to this host
#
iptables -A filter -i eth0 -p tcp --dport smtp  -j ACCEPT
iptables -A filter -i eth0 -p tcp --dport smtps -j ACCEPT
iptables -A filter -i eth0 -p tcp --dport 587   -j ACCEPT

#
# allow IMAPs & POP3s to this host
#
iptables -A filter -i eth0 -p tcp --dport 993 -j ACCEPT
iptables -A filter -i eth0 -p tcp --dport 995 -j ACCEPT

#
# Allow DNS lookups to this host
#
iptables -A filter -i eth0 -p tcp --dport domain -j ACCEPT
iptables -A filter -i eth0 -p udp --dport domain -j ACCEPT
iptables -A filter -i eth0 \
             -p udp --sport domain --dport 1024: -j ACCEPT

#
# allow outgoing ftp connections
#
iptables -A filter  -p tcp --sport 21 \
              -m state --state ESTABLISHED -j ACCEPT
iptables -A filter -p tcp --sport 20 \
              -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A filter -p tcp --sport 1024: --dport 1024:  \
              -m state --state ESTABLISHED -j ACCEPT

#
# let people ping us
#
iptables -A filter -p icmp -j ACCEPT

#
# Log all else
#
iptables -A filter -j LOG

#
# drop all else
#
iptables -A filter -j DROP

#
# install the input and output filters for input transactions
#
iptables -A INPUT   -j filter

If you're interested, you can download the script and associated files here.

Note that at this point, while I'm setting up the system, many of those ports opened above are commented out and then, as I install the various components (such as Apache2) I open the respective port.

Once completed, I installed the script in /etc/init.d using the install directive in my Makefile (make install) and then used the following command to setup the necessary /etc/rc*.d files to ensure the firewall started as necessary when the system was booted.

update-rc.d firewall defaults

Backup

Whether or not we actually do it, we all know that we should be backing up our systems and data. This is especially true for remote systems where we don't have easy local access.

My hosting provider does have backup options, but they cost re-occuring money that I don't want to spend if I don't have to. So, my solution is to backup my remote server onto one of my home servers (where I have TBs of space anyway).

Since I have a firewall at home, I have to do the backup across a two step ssh tunnel similar to what I decribed in Backing up using S SH & Rsync. The first connection goes from my remote server to my firewall and the second connection goes from the remote server through the first connection to my backup server. I then rsync a number of directories on the remote server to the backup sever including:

/etc, /var, /usr/local, /home

For security reasons, I require private key authentication for this connection on both my gateway and my backup server, I use a user account which has no login shell and no login directory and I configure that the only service that can be accessed is the rsync service. Not perfect, but it's good enough that I can get some sleep at night.

One problem with this setup is that the second ssh tunnel connects to a port on localhost in order to establish the connection to the remote system which can be a problem if there's other ssh connection tunnels setup similarly. To get around that, I add an alias for my backup server to the localhost entry in /etc/hosts file. So, rather than connecting to localhost the second tunnel connects to the host backup_server and thus keeps all of the SSH host keys separate.

If you're interested, you can download a modified (I removed any credentials & system names) of the script from here.


Bind9 (DNS Server)

I host DNS for a most of the domains for which I host mail (a few of my friends host their own DNS, but use my mail server). A long time ago, I wrote a shell script that creates the necessary configuration files for the set of domains I manage (which makes it easy to add new domains which are following the same rules and makes it easy to change things around when I change my configuration).

Preparation for the move

Since nameserver changes can take some time to propogate through the internet, this is the first service that I installed, configured and exposed on the new system. In preparation for the move, I went to my old nameserver and cranked down the caching settings for the domains I hosted there in order to reduce the propagation time. My typical settings are:

@     IN    SOA     mydomain.com.  postmaster.mydomain.com. (
      2010010200      ; serial number
      86400           ; refresh every 24 hours
      3600            ; retry after an hour
      604800          ; expire after 7 days
      86400           ; keep 24 hours
)

In preparation for the move, about a week in advance I reduced these settings to:

@     IN    SOA     mydomain.com.  postmaster.mydomain.com. (
      2010010800      ; serial number
      3600            ; refresh every hour
      1800            ; retry after a half hour
      7200            ; expire after 2 hours
      3600            ; keep 1 hour
)
And finally, the day before the switch, I moved to:
@     IN    SOA     mydomain.com.  postmaster.mydomain.com. (
      2010010900      ; serial number
      1800            ; refresh every half hours
      1800            ; retry after a half hour
      600             ; expire after 10 mins days
      600             ; keep 10 mins
)

Installation and configuration

I installed the nameservice daemon software and utilities using:
apt-get install bind9 dnsutils bind9-doc resolvconf

I then copied my setup files from the old server to the new server. The way that /etc/named.conf is managed has changed. On my old server all of the settings were in that one file. However, in Ubuntu, that file is intended to be unchanged and the local options are supposed to be placed into /etc/named.conf.options while the host references are intended to be placed into /etc/named.conf.local. So I changed my scripts to match the new model and modified the Makefile to correctly installe the new compoonents.

I've always run my named (the nameserice daemon) within a chrooted environment and every time I do this I have to yet again figure out what pieces need to be there in order to get things working. So this time, I wrote a CreateChroot.sh script and ran it to create the chroot environment for me (and now I don't have to figure it out from scratch the next time!). In addition to creating the chroot environment, I had to change the OPTIONS directive in /etc/default/bind to include "-t /var/cache/bind" so that the file now looks like:

OPTIONS="-u bind -t /var/cache/bind"
#OPTIONS="-u bind"
# Set RESOLVCONF=no to not run resolvconf
RESOLVCONF=yes

In first setting up the new server, I made no changes other than to add a new entry for my new server. So my new nameserver had pretty much the same host entries that were on the old server. So I ran my script for creating and installing my named configuration and restarted the bind9 service.

At this point, I opened the DNS TCP & UDP ports on my filewall so that I could accept incoming nameservice requests. In order to test the service, I went to my old server and used nslookup to test the new server:

# nslookup
> server newns.mydomain.com
Default server: newns.mydomain.com
Address: 192.168.169.11#53
> www.mydomain.com
Server:         newns.mydomain.com
Address:        192.168.169.11#53

Name:   www.mydomain.com
Address: 192.168.169.11
> mail.mydomain.com
Server:         newns.mydomain.com
Address:        192.168.169.11#53

Name:   mail.mydomain.com
Address: 192.168.169.11
>exit

This showed that things were working as I intended.

The Switchover

At this point, everything was ready to go, so I went to my domain registry (Network Solutions) and changed the host records for my nameservers to make the new nameserver my primary dns server and my old server to be the secondary server.

This worked fine (though they warned me it could take 72 hours to propagate) and I ran a bunch of tests from my home network, my work network and my old server and everything was peachy keen.


Web Server

I run a web server for my own family web site. It's all hand-coded html (yeah, kinda old fangled, but I haven't had the time, energy or inclination to re-architect it. Setting it up on the new server was pretty simple.

First step was to copy over the directory heirarchy from the old server to the new server. Just tar'd it up and scp'd it over to the new server and untar'd it within the /home/www directory.

Next step involved geting apache2 installed...

apt-get install apache2

The configuration for the web servers is located in: /etc/apache2/sites-available which comes with a single default file. I renamed this file to be www.cahillfamily.com (allowing for more sites at some point in the future) and editet that file to match up the settings from the old server.

Server Side Includes (SSI)

SSI is a capability on the server which allows an html file to include html from other files on the same web server. I use this feature extensively to maintain a consistent menu structure by placing it in one file and including it in all the html files on the server.

To enable this feature, I did the following:

  1. Set the Includes option within the configuration section for my virtual host.
  2. Set the +XBitHack option as well. This allows me to indicate to the server that there's an include directive in the file by simply setting the executable bit on the file (rather than having to have a particular suffix on the html file).
  3. Enabled mod-include by running the following command:
    a2enmod include

Proxies

I run a few proxy severs on my remote server that I have found useful when I'm behind some crazy firewalls or when an ISP has tight controls on the number of outgoing connections -- I've run into racheted down connection limits on my former home ISP (RoadStar Internet and at some hotels while on the road.

So I setup the proxies on my remot server, SSH to the server and then tunnel my services through that server.

WARNING: You have to be very careful when you setup proxies so that you don't end up creating an open proxy that others can use to make it appear that bad things are coming from your server. If you do set one up, do so carefully.

Socks 5 Proxy

Socks 5 is used for proxying many of my different Instant Messenger connections (I have like 5 of them). For Ubuntu, the common/best one seems to be the Dante-Server wich I installed using:

apt-get install dante-server

I configured it to only allow connections from the local system (since I will have an SSH tunnel to the server). This prevents others from using it unless they have internal access to my server.

*** /etc/danted.conf.orig       2009-12-31 11:29:41.000000000 -0500
--- /etc/danted.conf    2009-12-31 11:39:16.000000000 -0500
***************
*** 37,43 ****

# the server will log both via syslog, to stdout and to /var/log/lotsoflogs
#logoutput: syslog stdout /var/log/lotsoflogs
! logoutput: stderr

# The server will bind to the address 10.1.1.1, port 1080 and will only
# accept connections going to that address.
--- 37,43 ----

# the server will log both via syslog, to stdout and to /var/log/lotsoflogs
#logoutput: syslog stdout /var/log/lotsoflogs
! logoutput: syslog

# The server will bind to the address 10.1.1.1, port 1080 and will only
# accept connections going to that address.
***************
*** 45,54 ****
--- 45,58 ----
# Alternatively, the interface name can be used instead of the address.
#internal: eth0 port = 1080

+ internal: 127.0.0.1 port=1080
+
# all outgoing connections from the server will use the IP address
# 195.168.1.1
#external: 192.168.1.1

+ external: xx.yy.zzz.aaa
+
# list over acceptable methods, order of preference.
# A method not set here will never be selected.
#
***************
*** 57,66 ****
#

# methods for socks-rules.
! #method: username none #rfc931

# methods for client-rules.
! #clientmethod: none

#or if you want to allow rfc931 (ident) too
#method: username rfc931 none
--- 61,70 ----
#

# methods for socks-rules.
! method: username none #rfc931

# methods for client-rules.
! clientmethod: none

#or if you want to allow rfc931 (ident) too
#method: username rfc931 none
***************
*** 106,112 ****
# can be enabled using the "extension" keyword.
#
# enable the bind extension.
! #extension: bind


#
--- 110,116 ----
# can be enabled using the "extension" keyword.
#
# enable the bind extension.
! extension: bind


#
***************
*** 162,167 ****
--- 166,178 ----
#     method: rfc931 # match all idented users that also are in passwordfile
#}

+ #
+ # Allow any connections from localhost (they will get here via SSH tunnels)
+ #
+ client pass {
+       from: 127.0.0.1/32 port 1-65535 to: 0.0.0.0/0
+ }
+
# This is identical to above, but allows clients without a rfc931 (ident)
# too.  In practise this means the socksserver will try to get a rfc931
# reply first (the above rule), if that fails, it tries this rule.

Web (HTTP/HTTPS) Proxy Server

Since I already had the web server up and running, setting up a web proxy was easy. First I had to ensure that the necessary modules were installed and enabled:

a2enmod proxy
a2enmod proxy-connect
a2enmod proxy-ftp
a2enmod proxy-http

Then I edited the /etc/apache2/httpd.conf file and added the following entries:

ProxyRequests On

<Proxy *>
  AddDefaultCharset off
  Order deny,allow
  Deny from all
  Allow from 127.0.0.1
</Proxy>

AllowConnect 443 563 8481 681 8081 8443 22 8080 8181 8180 8182 7002

The AllowConnect option is necessary if your're going to proxy other connections (such as HTTPS). Most of those numbers are legacy from some point in the past. The really necessary one is 443 (for HTTPS), some of the 8xxx ones were from when I was doing some web services testing from behind a firewall at work (so I could invoke the web service endpoint from my test application). Not sure about all the others, but I'm not to worried about it since I only accept proxy requests from the local system.


Mail Server

Setting up a mail server can be somewhat complex, especialy when you throw in the fact that I was moving a running mail server to a new system and adding new client capabilities. On my old server, all of my users had to SSH into my server with private key authetnication and then tunnel POP & SMTP over the SSH connection. This could be a pain (to say the least) and restricted access for clients like the iphone or other devices. Most of my users (family & friends) are using an ssh tunnelling product from VanDyke that was discontinued back in 2004.

Installation

First step is to install the necessary components. Some of these were already installed with the server OS package (e.g. Postfix) but there's nothing wrong with making sure...
apt-get update
apt-get install postfix
apt-get install courier courier-pop-ssl courier-imap-ssl courier-doc 
apt-get install spell mail

Before I start actually accepting and processing mail, I thought it best to get the clients protocols all working, so onto the clients.

Mail Clients

I needed to enable support for your typical mail clients such as Outlook and Thunderbird (which require IMAP or POP3 to retrive mail and SMTP to send mail) as well as web browser clients. In the past, I have not supported web clients and I have required mail clients to tunnel their POP3 & SMTP over ssh tunnels. With the new server, I wanted to allow access without requiring ssh tunnels so that other clients (such as my iPhone) that didn't have ready support for ssh tunneling could get to the mail server. I also wanted to add browser based support so that people could check their email from other locations (such as a friends computer).

This involved the following steps:

Secure Sockets Layer (SSL)

For remote access to my server I needed to enable SSL so that user credentials were protected. My intent was to enable SSL on all the standard mail client protocols (SMTP, IMAP and POP) and to enable browser based access to mail via HTTPS and a web server based mail client.

Certificate Generation

In order to support SSL, I needed to get an SSL certificate. I could have created my own certificate and signed it myself, but that would have lead to error messages from the clients telling my users that perhaps they shoudln't trust my server. Instead, I signed up for an SSL certificate from GoDaddy which was running a special for $12.95/year for up to 5 years.

In order to create the certificate, I had to generate my private key and then a certificate signing request using the following commands:

*** make sure openssl is installed
# apt-get install openssl

*** Generate 4096 bit RSA server key
# openssl genrsa -des3 -out server.key 4096
Generating RSA private key, 4096 bit long modulus
.............................................................................++
................................................................................
......................................++
e is 65537 (0x10001)
Enter pass phrase for server.key: abcd
Verifying - Enter pass phrase for server.key: abcd

*** Generate certificate signing request for server key (note that the
*** "Common Name" must be the name of the host that the clients will connect
*** to if you don't want to get ssl errors)
# openssl req -new -key server.key -out server.csr
Enter pass phrase for server.key: abcd
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]: US
State or Province Name (full name) [Some-State]: Virginia
Locality Name (eg, city) []: Waterford
Organization Name (eg, company) [Internet Widgits Pty Ltd]: Cahills
Organizational Unit Name (eg, section) []:
Common Name (eg, YOUR name) []: mail.cahillfamily.com
Email Address []:

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

At this point, I took the server signing request, server.csr, and sent it to GoDaddy to get them to sign it and create my certificate. If, on the other hand, I wanted to do a self-signed certificate, I would have performed the following steps:

*** sign the csr using our own server key (making this a self-signed cert)
# openssl x509 -req -days 1825 -in server.csr \
  -signkey server.key -out server.crt
Signature ok
subject=/C=US/ST=Virginia/L=Waterford/O=Cahills/CN=mail.cahillfamily.com
Getting Private key
Enter pass phrase for server.key: abcd

To test this, I configured Apache2 to support SSL and tested access to https://mail.cahillfamily.com. I first needed to enable the SSL module using the following command:

a2enmod ssl

I took the server key and server certificate and place them into a secure non-standard location (no need to advertise where) and set the access modes on the directory to restrict it to root only. In order for the server key to be used without a pass phrase, I ran the following commands to remove the pass phrase from the file:

mv server.key server.key.safe
openssl rsa -in server.key.safe -out server.key

I copied the default Apache2 site file into one for mail.cahillfamily.com and set it up using the following commands:

cp /etc/apache2/sites-available/default /etc/apache2/sites-available/mail.cahillfamily.com
ln -s /etc/apache2/sites-available/mail.cahillfamily.com /etc/apache2/sites-enabled/mail.cahillfamil
y.com

I then edited the configuration file to enable SSL and to point to the newly installed certificate and key files:

NameVirtualHost *:443
<VirtualHost *:443>
      ServerAdmin webmaster
      ServerName mail.cahillfamily.com

      DocumentRoot /home/www/mail.cahillfamily.com
      ErrorLog /var/log/apache2/error.log

      # Possible values include: debug, info, notice, warn, error, crit,
      # alert, emerg.
      LogLevel warn

      CustomLog /var/log/apache2/access.log combined
      ServerSignature On

      SSLEngine On
      SSLCertificateFile /path-to-ssl-files/server.crt
      SSLCertificateKeyFile /path-to-ssl-files/server.key
</VirtualHost>

I also wanted to automatically redirect any http: access to mail.cahillfamily.com to https: access, so I added the following section to the default site file which uses the RedirectPermanent directive to automatically redirect access on port 80:

<VirtualHost *:80>
      ServerAdmin webmaster
      ServerName  mail.cahillfamily.com
      RedirectPermanent / https://mail.cahillfamily.com
</VirtualHost>

IMAP and POP

After poking about some, I came to the conclusion that the right mail server for me to use to expose IMAP and POP interfaces for my mail clients is the Courier Mail Server.

Courier requires that you use the MailDir structure for user mailboxes while Postfix uses the mbox structure by default. So I changed Postfix to use the MailDir structure by adding the following setting to /etc/postfix/main.cf:

home_mailbox = Maildir/

I manually created an empty Maildir structure for all my user accounts.

For SSL, Courier requires the key and the certificate to be in a single .pem file. So I concatenated server.key and server.crt into a single server.pem file.

I edited the /etc/courier/imapd-ssl file to make the following changes:

  • Set SSLPort to 993.
  • Set both IMAPDSSLSTART and IMAPDSTARTTLS options to YES to allow both IMAP over SSL and TLS within IMAP (the latter being a TLS session that's started from within the IMAP session while the former is a plain IMAP session over an SSL tunnel).
  • Set IMAP_TLS_REQUIRED to 0 so that local connections from the web mail server could make use of imap without having to do TLS on the local (same system) connection. I planned to still block the standard IMAP port (143) in the firewall, so remote clients would not be able to access their mail without SSL/TLS).
  • Set TLS_CERTFILE to point to the recently created server.pem file.

I edited the /etc/courier/imapd file to make the following changes:

  • Added "AUTH=PLAIN" to the IMAP_CAPABILITY setting so that plain text authentication is allowed on non-tls connections to the imap server. This is necessary for the local connection from some web server mail clients which don't come with support for CRAM-MD5 or other non-PLAIN authentication mechanisms.

I edited the /etc/courier/pop3d-ssl file to make the following changes:

  • Set SSLPort to 995.
  • Set both POP3DSSLSTART and POP3DSTARTTLS options to YES to allow both POP3 over SSL and TLS within POP3 (the latter being a TLS session that's started from within the POP3 session while the former is a plain POP3 session over an SSL tunnel).
  • Set POP3_TLS_REQUIRED to 0 so that local connections from the web mail server could make use of imap without having to do TLS on the local (same system) connection. I planned to still block the standard POP3 port (110) in the firewall, so remote clients would not be able to access their mail without SSL/TLS). However, this would enable my existing clients which ssh to the server and then use non-TLS POP to still be able to get their email.
  • Set TLS_CERTFILE to point to the recently created server.pem file.

Restarted the courier related services:

service courier-imap stop
service courier-imap-ssl stop
service courier-pop stop
service courier-pop-ssl stop
service courier-imap start
service courier-imap-ssl start
service courier-pop start
service courier-pop-ssl start

Yeah, I probably could have simply used the "restart" command on each of them but I wanted to have them all stopped and then start them all so I was sure that they call came up cleanly under the same configuration.

Now it was time to test things. First a quick telnet connection to the local imap port (143):

# telnet server 143
* OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THRE
AD=REFERENCES AUTH=PLAIN SORT QUOTA IDLE ACL ACL2=UNION] Courier-IMAP ready. Cop
yright 1998-2005 Double Precision, Inc.  See COPYING for distribution informatio
n.
01 LOGIN username password
01 OK LOGIN Ok.
0000 logout
* BYE Courier-IMAP server shutting down
0000 OK LOGOUT completed
closed

So that worked. I ran a similar test for POP3 which also worked. Now I was ready for some remote testing. First step was to go back to my firewall and open ports 993 (IMAPS) and 995 (POP3S) to allow incomming connections to the IMAP and POP services.

Then I went to http://www.wormly.com/test_pop3_mail_server and ran several tests with the POP3S implementation (with test accounts, of course) which all worked fine.

I didn't see a similar testing tool for IMAP, so I ran some tests from one of my home computers using the following command:

openssl s_client -crlf -connect mail.cahillfamily.com:993

Which worked like a charm (with some finagling with the /etc/hosts file to override mail.cahillfamily.com's IP address). This also worked like a charm, so at this point I figured I had IMAP and POP up and running.

Authenticated SMTP

When setting up an SMTP server, you have to be very careful that you don't configure your server as an open relay (where it will send mail from anyone to anyone). It seems that hackers, scammers and spammers are forever looking for new open relays that they can use to send out spam and shortly after opening an SMTP port on the internet you can usually find attempts to make use of the server as a relay.

For basic unauthenticated SMTP (e.g. where there's no local user authentication within the SMTP session), I configured the server to only accept incomming mail whose delivery address is within one of my managed domains. Any mail with a destination address outside of my domain is rejected before we accept the mail message itself.

However, that configuration wouldn't work very well for my users who typically do want to send mail to people outside of my domain. In the past, my solution was simple: ssh tunnel to my host then sent mail via SMTP on the local host interface where I could treat any local connections as, by default, authenticated.

While I am continuing to allow that configuration with the new server setup, it wouldn't work for those users trying to use a mail client without the ssh tunnel. So I had to enable authenticated SMTP and I had to configure it to require such sessions over SSL.

The SMTP server is managed by Postfix itself. So first step was to modify the /etc/postfix/main.cf configuration file to only accept main with recipients in my networks:

#
# restrict smtp operations on unauthenticated (port 25) connections
#
smtpd_recipient_restrictions = permit_mynetworks,reject_unauth_destination

Then I modified the /etc/postfix/master.cf configuration file to enable both TLS within SMTP sessions and SMTP over SSL/TLS by including the following directives:

submission inet n       -       -       -       -       smtpd
-o smtpd_tls_security_level=encrypt
-o smtpd_sasl_auth_enable=yes
-o smtpd_client_restrictions=permit_sasl_authenticated,reject
-o smtpd_recipient_restrictions=permit_sasl_authenticated,reject
smtps     inet  n       -       -       -       -       smtpd
-o smtpd_tls_wrappermode=yes
-o smtpd_sasl_auth_enable=yes
-o smtpd_client_restrictions=permit_sasl_authenticated,reject
-o smtpd_recipient_restrictions=permit_sasl_authenticated,reject

These settings, along with the base configuration, should give me server to server SMTP on port 25 and client to server user authenticated SMTP over TLS/SSL on ports 465 and 587.

Now that I have SMTP which allows for authentication, I had to install and configure the sasl authentication daemon as follows:

  1. I installed the package using:

    apt-get install libsasl2 sasl2-bin
  2. I edited the /etc/defaults/saslauthd to make the following changes:

    • Set START=yes so the daemon will start.
    • Configured saslauthd to place it's runtime information underneath the postfix chroot environment by changing the OPTION parameter and adding the following lines:
      PWDIR="/var/spool/postfix/var/run/saslauthd"
      OPTIONS="-c -m ${PWDIR}"
      PIDFILE="${PWDIR}/saslauthd.pid"
  3. I created the saslauthd run directory using:
    mkdir -p /var/spool/postfix/var/run/saslauthd
  4. Configured saslauthd to leave its files readable by postfix (so postfix could communicate with the daemon) using the following command:
    dpkg-statoverride --force --update --add root sasl 755 \
                  /var/spool/postfix/var/run/saslauthd 
  5. Created /etc/postfix/sasl/smtpd.conf file and added the following lines:
    pwcheck_method: saslauthd
    mech_list: plain login
  6. Restarted both saslauthd and postfix

Now I was ready to start testing, so I went back to my firewall and opened ports 25 (SMTP), 465 (SMTP over SSL) and 587 (TLS within SMTP) so that I could start testing.

To test all of this you could use a mail client, or if you're a bit more adventurous (and want to see exactly what's going on) you can do this manually within a telnet/openssl connection). The following is an example test session:

$ telnet localhost 25
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 mail.cahillfamily.com ESMTP Postfix (Ubuntu)
ehlo mail.cahillfamily.com
250-mail.cahillfamily.com
250-PIPELINING
250-SIZE 10240000
250-ETRN
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
mail from: user@localhost
250 2.1.0 Ok
rcpt to: someuser@someotherhost.com
250 2.1.5 Ok
data
354 End data with .
Subject: Test message

Sending yet another test... hope it gets there...

.
250 2.0.0 Ok: queued as B28C461A0A9
quit
221 2.0.0 Bye
Connection closed by foreign host.

That's a standard, unauthenticated SMTP session. I find using the manual sesssion for testing makes it easier to identify what the problem is when there is a problem. For example, in the list of responses after my "ehlo" command, you see "250=-STARTTLS" - this indicates that TLS is enabled within the server).

To test an authenticated SMTP session, you will need to enter a command similar to the following (I usually do this right after the "ehlo" command, though I'm not sure if it has to be exactly there):

auth plain AHVzZXJpZABwYXNzd29yZA==
235 2.7.0 Authentication successful

The "AHVzZXJpZABwYXNzd29yZA==" parameter is a base64 encoding of a plain text SASL authentication string. You can generate one manually using the following perl command:

perl -MMIME::Base64 -e 'print encode_base64("\000userid\000password")'

Where userid = the test user's id and password = the test user's password If you have a special character in either string (such as an @ in the user id (e.g. user@host) you need to escape the character (e.g. "\@").

So, now that I have all that, I ran the following tests:

  • Test local unauthenticated SMTP connection to send mail to remote system (for my clients that ssh to server and send out from there)
    telnet localhost 25 
    and then run through SMTP session described above.
  • Test remote unauthenticated SMTP connection doesn't allow sending mail to remote locations. Go to a remote system and run:
    telnet mail.cahillfamily.com 25
    and try SMTP session above - should fail with either a) permission denied or with relay access denied when you enter the "rcpt to" command.
  • Test remote unauthenticated SMTPS connection as follows:
    openssl s_client -starttls smtp -crlf -connect mail.cahillfamily.com:587

    and try SMTP session above - should also fail, this time with permission denied since we only setup authenticated SASL connections on this port.

  • Test remote authenticated SMTPS connection using the following:
    openssl s_client -starttls smtp -crlf -connect mail.cahillfamily.com:587
    and this time include the "AUTH PLAIN" command at the start of the session. This should succeed.
  • Test remote authenticated SMTP over TLS connection as follows:
    openssl s_client -crlf -connect mail.cahillfamily.com:465
    and include the "AUTH PLAIN" command at the start of the session. This should succeed.

Web Server Mail Client

For browser clients, there are a couple of obvious possibilities that come to mind:

  • SqWebMail - a component of the Courier Mail Server which provides access to mail files via direct access to the mailboxes.
  • Squirrel Mail - a web server based mail client that gets lots of good recommendations as being one of the best open source solutions. This tool uses the IMAP interface to access the user's mail files rather than direct manipulation.

    As a bonus, this tool also has an available Outlook-like plug-in that gives users the look/feel of Outlook 2003.

I took a look at the two tools and decided to go with Squirrel Mail and, for now, just install the basep toolkit. I'll explore the Outlook model at some point in the future. Ubuntu has SquirrelMail available as a standard package so I installed it using the following command:

apt-get install squirrelmail

I then modified the /etc/apache2/sites-available/mail.cahillfamily.com configuration file to use the squirrelmail application as the document root, so my users go straight into the application when they visit mail.cahillfamily.com in a browser. The modified file looks as follows:

NameVirtualHost *:443
<VirtualHost *:443%gt;
      ServerAdmin webmaster@localhost
      ServerName mail.cahillfamily.com

      DocumentRoot /usr/share/squirrelmail
      ErrorLog /var/log/apache2/error.log

      # Possible values include: debug, info, notice, warn, error, crit,
      # alert, emerg.
      LogLevel warn

      CustomLog /var/log/apache2/access.log combined
      ServerSignature On

      SSLEngine On
      SSLCertificateFile /etc/ssl/server.crt
      SSLCertificateKeyFile /etc/ssl/server.key

Include /etc/squirrelmail/apache.conf

</VirtualHost%gt;

Used by browser to test it and everything seems kosher.

SPAM filtering

To filter or not to filter.... For many years I ran my server with no server-side filtering and instead relied on client filtering. However, the abundance of crap that keeps on coming only seems to grow exponentially every year and I finally convinced myself that not only was server side filtering necessary, but it was mandatory. This is especially evident when you're trying to download mail after having been disconnected for a day or so and find that you have hundreds of email messages, most of which are clearly SPAM.

I use spamassassin for spam filtering. Looking around at most of the how-to's/docs I see that most people recommend usiing spamassassin to just flag spam, but then go ahead and deliver it to the user's mailbox. This is probably the best solution if you don't want to lose any potential emails that have incorrectly been marked as SPAM. However, that means that my clients have to download hundreds of spam messages just to throw them out when the got to the client.

For my system, I'd rather have Spamassassin get rid of at least some spam and then let some of the questionalbe stuff through. So, I've setup things such that mail messages that get a Spamassassin grade of 10 or higher get saved off into a directory on the server (one directory for each day to ease management). For messages that have a grade between 5 and 10, the subject gets re-written to include a SPAM indicator, but the message is still delivered to the intended recipient.

I've been doing it this way for the past 2 years. We get on the order of five thousand (yeah: 5,000) messages culled this way each day and I've yet to find or get a report of any false positives. Note that there's still a bunch of email that gets through with grades between 5 and 10.

Anyway, to set this up on the new server:

  • Install the latest version of spamassassin using:
    apt-get update
    apt-get install spamassassin spamd
    
  • Installed spamchk script (not sure where I originally got it, but I've been using it on my old mail server for several years now) in /usr/local/bin/spamchk
  • Created /var/spool/postfix/spam/save and /var/spool/postfix/spam/tmp directories for processed messages
  • Edited the /etc/postfix/master.cf file to add an output filter for mail coming in the default smtp connection (we don't need it on the SSl connections since they are authenticated) and to add the spamck invocation. Modified lines look as follows:
    smtp      inet  n       -       -       -       -       smtpd
      -o content_filter=spamchk:dummy
    

    And at the end of the file, added:

    #
    # SpamAssassin check filter
    #
    spamchk   unix  -       n       n       -       10      pipe
      flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} -- ${recipient}
    
  • By default, Spamassassin places a detailed spam report in any message that is flagged as spam (spam score >= 5) and moves the original message to an attachement. I find this cumbersome and so instead I like to flag the subject of the message with a "[SPAM]" flag and otherwise leave the message alone (you do still get the Spamassassin headers added to the message, but they are hidden from the default view in most mailers).

    To achieve this, I edited the /etc/mail/spamassassin/local.cf file and make the following changes:

    *** local.cf.orig       2010-01-02 10:45:58.000000000 -0500
    --- local.cf    2010-01-09 20:54:46.000000000 -0500
    ***************
    *** 9,21 ****
    
    #   Add *****SPAM***** to the Subject header of spam e-mails
    #
    ! # rewrite_header Subject *****SPAM*****
    
    
    #   Save spam messages as a message/rfc822 MIME attachment instead of
    #   modifying the original message (0: off, 2: use text/plain instead)
    #
    ! # report_safe 1
    
    
    #   Set which networks or hosts are considered 'trusted' by your mail
    --- 9,21 ----
    
    #   Add *****SPAM***** to the Subject header of spam e-mails
    #
    ! rewrite_header Subject [SPAM]
    
    
    #   Save spam messages as a message/rfc822 MIME attachment instead of
    #   modifying the original message (0: off, 2: use text/plain instead)
    #
    ! report_safe 0
    
    
    #   Set which networks or hosts are considered 'trusted' by your mail
    
  • Spamassassin likes to learn about its mistakes (both positive and negative). Since my users don't have local access to the system, I need to add aliases which allow people to forward mail attachments that are or are not spam so that Spamassassin can use that information in its learnings.

    First step was to get the sa-wrapper.pl script from Stefan Jakobs. This script had a dependency on the perl modlue MIME::Tools which I used the following comand to download and install it (as well as a bunch of dependencies it had):

    cpan -i MIME::Tools

    Then I setup the aliases in /etc/aliases as follows:

    # Spam training aliases
    spam: "|/usr/local/bin/sal-wrapper.pl -L spam"
    ham: "|/usr/local/bin/sal-wrapper.pl -L ham"
    

    When I tested it, the script failed because it couldn't open/write to the log file. I manually created the log file and set it be writable by the tool.

The Switchover

The switchover had to be handled carefully in an attempt to not loose any mail as I moved things (or as little as possible). The sequence I worked out and used was as follows:

  1. Stop mail services on both the old and the new servers -- ALL mail services: SMTP, POP3, IMAP, etc.
  2. On the old server, tar up all of the existing user accounts and user mailboxes and transfer them to the new server.
  3. Copy the /etc/passwd and /etc/shadow files to the new server and copy out the user accounts that are moving and add them to the existing /etc/passwd and /etc/shadow files on the new server.
  4. Copy the /etc/postfix configuration files from the old server to the new server and merge in any of the local settings from the old server. In particular the virtual domains information for all of the domains I host had to be incorporated into the new setup.
  5. Copy the /etc/aliases file from the old server to the new server editing the file to remove any extraneous/old/useless entries. Run newaliases to notify Postfix of the changes.
  6. Untar the user accounts in /home on the new server and set the owner/group ownership as necessary.
  7. Convert Mbox mailboxes to the new Maildir format on the new server.

    While I do alot of relaying of mail, there are a number of people who actually get their mail off of my server and so I needed to move their incomming mail to the new server and beccause we changed from mbox format to Maildir format, I needed to split the mail up in to individual files.

    I found a perl script to do the conversion (mb2md) which I downloaded from here. Ran a few tests and figured out that I would use the command as follows:

    mb2md -s "full path to mbox file" -d "full path to Maildir directory"
    And, since I was doing this as root, I would need to:
    chown -R user.group "full path to Maildir directory"
    so that the right user owned all the files.
  8. Create Maildir structures for those users who didn't have mail in their mailboxes.

    For those users who didn't have mail sitting in their mbox files on the old system, I would need to create the correct heirarchy within their login directory for Maildir delivery. So I ran a script similar to the following (I just did it from the command line, so I don't have an actual copy of the script) in /home:

    for user in user_list
    do
        mkdir $user/Maildir $user/Maildir/cur $user/Maildir/new $user/curdir/tmp
        chown -R $user $user/Maildir
    done
    
  9. On both servers: Edit the DNS records to change the IP address for mail.cahillfamily.com to be the new server and assign the name oldmail.cahillfamily.com to the old server. And, of course, pubish these changes.
  10. Enable mail services on the new server (do not, for at least a day or so, enable mail services on the old server in order to force any mail in other SMTP queues to go to the new server).
  11. Test the setup by sending emails to various users in my hosted domains from local clients, clients in my hame and from my work email account to ensure that the changes had propogated out to the real world.

Epilogue

That's about it... At least what I remember. I'm sure that there are things I did during the move that I forgot to write down, but I did try to record everything. I'll update this if/when I figure out anything I did wrong or forgot to note.

I hope someone out there finds this useful. I know I will the next time I need to move the mail server to a new system.

Tags : / / / / /

Tuesday, March 03, 2009

Cool gadget #14

I've always been an anti-multifunction office device kind of person. If you got a good printer, it sucked at scanning or faxing. If you got a good fax, it sucked at printing or scanning. If you wanted to print a lot inexpensively, you used a monochrome laser printer. If you wanted to print color, you used an ink jet type printer. None of the multifunction devices seemed to be good enough to replace multiple dedicated devices.

In my home office, I've had a good monochrome laserjet printer (HP 4000TN), a good inkjet printer (HP 1200DN), excellent fax machine (Xerox something or other), a good copier (again a Xerox something or other) and a decent scanner.

Well, that has finally changed. The quality of all-in-one devices has gotten good enough that I now find them acceptable for most office tasks. Well, I guess I should clarify that I find the higher end devices satisfactory. The low end devices still are missing or have brain dead implementations of many of the core features that I require. The HP Color Laserjet CM2320fxi Multifunction Printer is one such all-in-one printer. The cost is a bit high for some home purchases (I paid a discounted $850), but the functionality gives me all the magic features I needed and does them all well enough that I can get rid of the existing multiple devices I have lying about which, together, accomplish some of the same tasks.

This device does the following tasks very well:

  • Built-in network printing from any computer in the house.
  • Black/white laser printing
  • Color laser printing -- looks as good as anything I've gotten off inkjets
  • Automatic duplex printing (printing both sides of the paper).
  • Black/white copying (single or multi-page)
  • Color copying (single or multi-page)
  • Fax sending/receiving with auto document feed
  • Automatic Scanning to email of multi-page documents (PDF)
  • Print directly from camera memory cards

My only complaints are:

  • it is somewhat more noisy than my old laserjet printer, though after a few weeks I've gotten used to it and don't notice it all that much
  • it is tall (because of the scanner unit on top with space for paper outputs and with 2 input trays). So tall that I haven't hooked up the 2nd input tray or the top would hit the cabinets above. It would be nice if the scanner/control unit could be separated and placed to the side of the printer. Yeah that would look like two devices, but it would make it easier for my kids to see the top buttons.

Those are relatively minor nits. We are extremely happy with this printer and all of its features..

That said, I do continue to own a desktop flatbed photo scanner and a dedicated film scanner. I could probably do most of what I want to do with the flatbed scanner with the new device. However, there's a lot of convenience to having it on my desk easily reachable when scanning many prints and I can take it with me when I go to the parents house to scan old pictures there.

So while I have gotten rid of the fax machine, copier, laser printer and inkjet, I still have some specialized devices lying about. And, BTW, I sold the old devices for $100 and sent back the inkjet printer to HP for an upgrade rebate of another $100, so the net cost to me was just $650.

Tags : / / / / /

Friday, February 27, 2009

Exercising on the road

I've spent the past week bouncing up and down the west coast between San Jose, CA and Portland, OR -- not spending even 48 hours in either location at any point.

This threw a wrench into my exercise program because not only did I have to find the time to exercise, I also had to figure out what to do with my sweaty clothes when I checked out each day.

At first glance, you might think that's easy -- just put the wet clothes in one of the plastic laundry bags and pack it. That is what I typically do when I'm checking out on my way home. However, since I wanted to use the clothes to exercise each day and I didn't feel like putting on wet clothes to go work out, I needed to dry them out.

When I'm staying at the same place, I can just let them air dry and that works well enough. However, since I had to change hotels 3 times this week, I needed something else to do. I could have used the iron to heat up and steam them out, but it just felt like something was wrong with ironing sweat into my clothes.

I ended up using the room blow dryer to just blow them dry. Worked fine. Clothes were dry each day and nothing appeared to be growing on them (plus the rest of my clothes stayed dry.

In case you're wondering, I did an hour on the stationary bike each day. Not too shabby for an old man, if I must say so myself.

Tags : /

Friday, February 20, 2009

Digitizing slides

In the days of film cameras, one of the ways to take a lot of pictures cheaply was to use slide film rather than standard film. The film was around the same price, but when developing slides, you didn't get any prints, so rather than a $10 or $15 bill for the developing, the bill was just $3 or so (if I remember correctly -- in any case it was way cheaper). Others also would claim that the slide film was better for pictures & sharing since you could no project them to an audience (back in the days of 9 and 11 *inch* black and white TVs and *no* computers, there wasn't any other way to do it).

So, I have thousands of slides that I have taken over the years(several hundred from my honeymoon alone) and my mother-in-law brought over a bunch of slides that Angie's father had taken over the years (going back to the late 50s). I want to get all of these scanned into the computer so that we can share them and, if desired, print them.

Before I get into the nitty gritty, I want to lay out some ground rules that I have for scanning large batches of slides/negatives. These have grown out of my experience scanning film and your mileage may vary, but I think they are a good starting point for anybody thinking about a similar project. They include:

  • I want the process as automated as possible so that I can do real work while the scanning is going on. Processes that require manual intervention every few minutes means that I have to dedicate larges amounts of spare time that I just don't have (like any of you do).
  • I want "good enough" quality pictures to come out of the scan process so that I don't have to do any manual processing of the photos (other than rotating them). When I first started scanning negatives, I would do a raw scan at high resolution and then spend 15 to 20 minutes per photo to get them to a state where I liked them. This is clearly unacceptable for large amounts of photos.

    So my model is to get them good enough off the scanner so that I can enjoy/share/watch/etc. without any manual processing.

  • I want to be able to easily figure out which slide/negative the photo came from after I'm done scanning in case there's a picture that I want to do more with (such as scanning at high resolution and lots of manual processing so we can print out an 8x10 or 16x20 photo). This means that I need to be able to figure out which negative from without having to resort to a manual search of thousands of slides.
  • I want to preserve the film in case someone wants to work with it years from now.
  • Speed is not the driving factor. Scanning thousands of slides/negatives will take time. What is key is that the work can be done while I'm doing other stuff. This leads to some choices on the scanning which actually make the scans take longer, but you get better quality scans and you get to keep working on the day job while you're doing the scanning.

These ground rules led to a number of choices I made in setting up this process. As I describe the process, I'll try to explain why and how I made these choices.

Choosing the scanner

The first issue to address is how am I going to scan slides themselves. There are two basic options for scanning slides:

  • Using the slide adaptor that comes with most flatbed photo scanners (if you have a multi-function device (otherwise known as an all-in-one), you're probably out of luck as they don't seem to come with options for scanning slides). These adapters typically require that you place some number of slides (typically 3 or 4) into the adapter, remove the typical white background for document scanning and then scan the slides).

    I find this process painful for many reasons, the biggest one being that it's very time consuming and manual in nature. However, this isn't too bad if you don't have a bazillion slides to process.

  • Using a film scanner designed to scan slides and negatives (film) rather than scanning documents/photos. These typically do a much better job on film that the flatbed scanners and they usually also have substantial automation capabilities.

It just so happens that I have both types of scanners and for me the clear choice was to use the film scanner. My film scanner is a Nikon Super Coolscan 4000 ED (it's about 5 years old and has been superseded by the newer 5000ED).

Organizing for scanning

If you're like most people, your slides have not stayed in their little boxes that you get back from the developer and frequently they are intermingled (in some cases within one of those slide projector trays, in other cases in the little slide shoe box where you threw all the slides).

One note about handling slides: Most slides are raw film stored within a cardboard or plastic mount which just holds the film without providing any protection to the film itself. You should use care when handling the slides to keep fingerprints, water, dust, etc. off the slides. I recommend using low-cost lint free gloves available at most photo shops when handling the slides.

You can choose to stay with the disorganization and just scan things, or you can put the slides back into their original sets. I chose to do the latter because figuring out what's on slides and telling stories about them frequently his helped by the nearby slides on the same strip of film. Getting the slides back into the set and then perusing them in order helps greatly.

To get them back into sets, you need to look at each slide. Most slides, even those printed many years ago, will have two pieces of information on each slide. A slide number in one of the corners and a processing month/year stamp. Sometimes this information is printed on the slide. Sometimes it's embossed in the cardboard mount. In many cases, the printing is hard to read and you have to use some sleuthing to figure out what set the slide belongs to and what slide number it is in that set. In the slide below you can fairly clearly see the slide number (34), but the processing date (May 89) is embossed on the cardboard and a bit harder to see.

Once I had them all grouped in sets & ordered by slide number I simply rubber banded them and put them into my to-be-done box and then started cranking.

Scanning the slides

Setting up the scanner

My 4000ED has an optional slide feeder (SF-200) which can feed up to 50 slides at a time for automated processing. This is ideal for my project. However, in many of the reviews of the product and in various support web sites, I found that there were many complaints about slides jamming in the machine -- which would really interfere with my automatic process requirement. I came close to just blindly upgrading to the latest version of the feeder (SF-210) thinking that it had to be better than the one I already had. However, from the reviews that didn't seem to be the case.

I should note that after looking at the wide variety of slides that I had in my collection (especially when I added in the older slides from my mother-in-law) it isn't so surprising that this is an issue. The slides vary greatly in materials (plastic, cardboard, even some metal) and they varied greatly in thickness.

All that said, I found one suggestion in an Amazon review that recommended tilting the scanner about 10 degrees and instead of using the spring-loaded slide pusher, place a C battery into the tray (it would roll down with the slides adding just a small amount of continuous, even, pressure). I gave that solution a whirl and across about 2K slides only had 6 or so jams -- two of which were caused by material defects in the slide mounting (the film had curved out of the mount and caught on the next slide causing the two to load simultaneously). Not bad.

To accomplish this I used two index card packs to raise the one side of the scanner and just placed the battery into the tray as you can see below:

Setting up the scanner software

Nikon Scan 4 is the software package that comes with the scanner. I modified the default settings to enable the following features:

  • Enabled Digital ICE - which does a great job getting rid of dust and small scratches -- it's not perfect, but it does work pretty well.
  • Enabled Digital ROC and Digital GEM post processing - these do a level of fade & color correction that makes many scans presentable that otherwise wouldn't be without a lot of manual processing.
  • Enable multi-scanning 2x - each slide is scanned twice and the scanned data is averaged together -- this gets a better scanned picture on most slides.
  • Set resolution to 2,000 pixels/inch (about 1/2 the full res quality of the scanner) at 100% scale. Just to keep the pictures down to a reasonable size on disk and to make some of the post processing more efficient. I can always come back later if I want a better quality scan on a particular slide.
  • For each batch scan, I set the file name to a one up sequence starting with the year (so, for example, the slides I recently scanned had a base file name of si2009001 and a two digit sequential number of the slide within the slide set). When I processed the next batch, I would increase the base file name by one (e.g. si2009002). The net result is that I could tell which slide set and which slide within a slide set a digital file came from . For example, a digital file with the name si200904523.jpg came from slide 23 in the 45th slide set scanned in 2009.

Loading the slides

Emulsion side - Each slide has an emulsion side and a smooth side. The emulsion slide is the side that the image is recorded and it recorded backwards (to view the slide correctly you view through the slide from the non-emulsion side. This is important because most scanners will tell you that they want the emulsion side facing a particular way (either by directly mentioning the emulsion side, or by using pictures of a slide with an ABC on it (when ABC is backwards you are looking at the emulsion side). On most slides that have some kind of printing, the side that indicates "this side toward screen" or something like that is the emulsion side and the slide number and date stamp are typically on the viewing (non-emulsion) side.

Up vs down - the orientation of the slides (which edge is up) seems to be somewhat random with respect to the printing on the slides. In some cases they are both in sync (the slide correctly oriented when the number/time stamp are on the top. In other cases it's the opposite (the number/date stamp needs to be upside down on the bottom in order for the slide to be oriented correctly). I found I had to look at a few slides to figure out which way it worked with that set.

Landscape vs Portrait - while slides usually appear square, the film within the slide is not. When you're holding the camera horizontally (the normal position) the image will be recorded in a landscape mode (where the width of the image is longer than the height of the image). When you're holding the camera vertically (on its side) the image will be recorded in portrait mode (longer height, shorter width). This is important in slides because in most scanners you should not turn the slide to correctly orient the picture if it was taken in portrait mode. Just scan the picture in landscape mode and later, in software, rotate it 90 degrees to get it into portrait mode. The reason for this is that most scanners only scan the landscape portion of the slide and will miss some of the slide while recording some of the mount if you scan the slide in portrait mode.

Slide Numbers - most slide sets do not start with slide 1 (at least most of mine did not) and frequently that have slides missing (sometimes simply because the slide image was blank). I wanted the actual slide numbers to match the file names so I would start the file numbers with the first slide number and I would ensure that all slides were sequentially in order, filling in missing slides with slides from the end. When I had to do filling in, I would go back to the files after the set was scanned and manually renumber the fill-in slides to correctly represent their slide number.

Scanning the slides

I would simply load a set into the feeder (correctly oriented, emulsion side to the right when looking at the scanner) indicate in the software that I was feeding X slides and set the starting number at Y. Then I was off to do the real work while the scanner went along chugging through the slides in the feeder.

Slide Storage

In order to be able to quickly locate slides, as well as to provide for archival storage of the slides, I chose to use Print File Archival Slide Preserver sheets for the slides and placed a label on each sheet indicating the slide set (which was part of the digital file name) that the sheet contained:

You can get these at many photography supply stores. I purchased my at Archival USA.

Once I had the slides stored in the sheets, I placed the slide preserver sheets into Century Box Archival Storage Albums (that I also purchased from Archival USA). Another option would have been to buy the file hangers that Print File makes and simply hang the sheets in a file cabinet, but I preferred the storage box. Anyway, I placed the slide pages into the boxes and placed labels onto the boxes indicating which slide set ranges were in the box.

Miscellaneous Tidbits

Use the magnifying glass, Luke

I found having a magnifying glass quite useful in trying to determine the slide numbers and/or date stamps on slides as well as to try to determine the orientation of the slides on slides that had no markings. It was just plain useful. Get one and have it nearby when you're working on the slides.

Remounting Slides

In some cases, it might be worthwhile to remount slides. For example if the mount is damaged, too thick, or otherwise interferes with being able to scan the image. I had this with one particular set of slides that came from my mother-in-law. It seems that in the late 1950s in Europe, slides were mounted in metal mounts that sandwiched the film between two pieces of glass. When they got to me, they were in pretty sad shape:

So I ordered some slide mounts and peeled back the metal cover, separated out the film from the glass sandwich and mounted them into new slides which scanned much better than the originals had.

Summary

This process seems long and arduous, but in reality the most time consuming part (other than the remounting of that one metal set) was the organizing the slides step because many of the slides were mixed together, some had no writing on them whatsoever, many had slide numbers and date stamps that were almost unreadable (magnifying glass helped there sometimes).

Once the scanning got started, the process essentially amounted to about 5 to 7 minutes to swap slides and store the scanned slides every hour an a half or so (that's about how long it took to go through the average 30 or so slides per set with the settings I had used on the scanner software).

I'm very happy with most of the pictures and for those that I'm not happy with, the slide itself usually left a lot to be desired -- almost always because of low exposure on the film.

Tags : / / / / /

Wednesday, February 18, 2009

Unsubscribing hell...

For some unfathomable reason I decided today to try to unsubscribe to some of the various spam messages I get from reputable companies. I would never try to unsubscribe to the umpteen million messages I get about body parts enlargement (some of which wouldn't look so hot on my if they were enlarged) or performance enhancement as the act of unsubscribing just confirms that they have a real person on the other end of the email line.

So, for reputable companies in the US, they are required by the CAN-SPAM act of 2003 to have an opt out method in each email. From the FTC's web site:

It requires that your email give recipients an opt-out method. You must provide a return email address or another Internet-based response mechanism that allows a recipient to ask you not to send future email messages to that email address, and you must honor the requests. You may create a "menu" of choices to allow a recipient to opt out of certain types of messages, but you must include the option to end any commercial messages from the sender.

Any opt-out mechanism you offer must be able to process opt-out requests for at least 30 days after you send your commercial email. When you receive an opt-out request, the law gives you 10 business days to stop sending email to the requestor's email address. You cannot help another entity send email to that address, or have another entity send email on your behalf to that address. Finally, it's illegal for you to sell or transfer the email addresses of people who choose not to receive your email, even in the form of a mailing list, unless you transfer the addresses so another entity can comply with the law.

So, I took a look at several of my emails... The emails from Lands End, Sears, 1-800-Flowers.com, American Express and Apple all had links and they all worked as one would expect. The either directly unsubscribed you or brought you to a page that gave you a few options (different kinds of emails, change email address, etc.) and one or two clicks and you were done.

Microsoft, on the other hand, was a true royal pain in the *ss. I received an email from them that included the unsubscribe link at the top:

And another at the bottom:

So one would think that it's all kosher. That clicking on the link would get you unsubscribed. However, that wasn't to be the case. What you got when you used that link was a page which said that I had to use my Windows Live ID to manage my settings and that if I didn't have one, I would have to create a Windows Live ID account in order to manage my subscriptions.

So you can't just unsubscribe. You have to create an account on some Microsoft server.

Being the persistent one, I went ahead and did so. That required that I provide an email address and also required out-of-band email validation (where they send you an email that has a link you have to click on to prove that you actually have that email address.

Did that and got logged into Windows Live. However, all the stuff about managing my subscription was gone and there were no clear links on the page that would get me there. So I went back to the email that started this and selected the unsubscribe link again.

This brought me to the "Profile Center" where there was a link for manage subscriptions. I thought I was getting close, but no, there was another roadblock that they threw up. There was no email address in there (they didn't take the one I entered for my Windows Live ID account). So I had to enter it again. And, of course, before I could manage it I had to go through the email validation again.

Then back to the profile page and back to managing subscriptions where I could finally unsubscribe. Now I'm stuck with a Windows Live ID account that I don't want but I don't see any easy way to get rid of it.

I think this rigmarole they have set up is in clear violation of the spirit and intent of the CAN SPAM laws and should be fixed. I should be able to unsubscribe easily without having to create an account. I should be able to unsubscribe with a minimal of effort.

Kudos to Apple, Sears, and all the rest who, IMHO, got it right. Daggers to Microsoft who clearly got it totally and inexcusably wrong.

Tags : / / / / /

Monday, February 16, 2009

Digitizing life

Like many people today, I have a large collection of analog media containing family memories. Much of it is my own, but a substantial portion belongs to either my or my wife's parents. This includes film negatives, slides, prints, video film, video tapes, etc.

The saddest part about this old stuff is that it deteriorates over time (even when aggressive archival storage methods are used). In addition, it's very hard to share and usually gets dispersed as various interested parties (i.e. siblings) request to take one of them (sometimes promising to make a copy and return the original -- and I'm sure some actually do that).

I have piles and piles of pretty much all of that other than video film. I have decided that it's about time to bring it all into the modern digital world and am digitizing all of it -- negatives from all the 35MM photos I took, prints from all of our kids class/sports photos or from those 4x6s that we don't have negatives for, thousands of slides (which, IMHO, were the old fashioned "digital" camera in that you just paid $3 to get the roll of film developed without any prints and then said you would print the photos you liked, but never got around to it :-)).

When I'm done, I expect to be able to share my entire digital collection with my family either directly or when I post the more interesting photos on Facebook :-). I also expect that when my kids grow up and leave the house, they will each be able to take a copy of our entire collection with them to be able to peruse whenever they like.

I'm going to write a series of blog entries describing what I've chosen to do for each type of media and how I proceeded. Hopefully some out there will find it useful in one way or another.

BTW - there are a number of services out there that will do this for you for a fee. I've chosen to do it all myself rather than use a service because I want to organize things as I convert and I want to have sensible conversions (if you used the video camera to record your kids birthday and your friends kids' school performance you don't want them on the same DVD -- at least I don't). I've also worked to automate the process as much as possible so I can do it while I'm doing other things.

Finally, I've accepted that this will take a long time and not be done overnight and I will methodically work through the piles (and they are large piles).

Wish me luck!

Tags : / / / /