This document outlines the development of the USGIN Amazon Machine Image. The purpose of this image is to provide a means by which any group can implement a web server capable of delivering data in USGIN-endorsed formats at a minimal startup cost. Providing an AMI also allows a user to avoid hardware issues as well as some of the complicated configurations that might prove an obstacle to participation in USGIN. It is our hope that as much as possible, this AMI can work out-of-the-box, making it easy for geologists and geological surveys to focus on what they do best - generating and sharing really good geoscience data.
This document is in "book" form. Use the navigation tools at the bottom of each page to browse the contents of the book. You can use the "Up" link to return to this Table of Contents from any page.
If an Amazon EC2 instance crashes, everything on it is lost. Your only resource is to launch a new instance from the last image you made of your machine. For that reason, we want to make sure that any data, logs or dynamic configuration files are located in an Elastic Block Store. This data is hosted on Amazon's S3 service, and will not be lost should your EC2 instance crash and burn. Create a Elastic Block Storage Volume
By default, the image that we're using in our instance (ami-ccf615a5 ) only allows SSH logins with valid certificates. This means that users defined on the instance cannot SSH log in using usernames and passwords; they must have the appropriate certificates available.
The configuration of SSH access is controlled by the file /etc/ssh/sshd_config
First Time Login
Locate the .pem file that you downloaded when creating the instance. See this post for information on creating an instance. Next, follow these instructions to create a PuTTY Private Key (.ppk) to use to log in using PuTTY.
Building an AMI is essentially a way for you to backup and share the Amazon EC2 Instance that you've created. It is also useful for creating a fall-back point if an installation or process goes awry, and you'd like to revert to a previous state. Prepare to Bundle
The basic USGIN Amazon Machine Image will contain a few applications used to provide geoscience data services. All the software is free and open-source. These chapters provide walkthroughs of the installation process that was used to install these applications on our AMI.
Preparation: Updates and Upgrades These commands update your apt system with what is available in the repositories, and upgrade any packages already installed to which upgrades are available. This seems like a good thing to do on a regular basis, or at least before any software installations. Installing Apache Simple. This installs the latest version of Apache HTTP Server 2.x. At the time of this writing, that is version 2.2.14. When the process is complete the server will be started, and you should be able to test it by visiting http://<Elastic IP Address>. You should see a message "It Works!". Starting and Stopping the Server The commands are simple: These commands should be run as root. This might sound like a security hole, but it isn't... From the Apache Documentation: If the From another page of the Apache Documentation : In typical operation, Apache is started by the root user, and it switches to the user defined by the If you take a look at /etc/apache2/apache2.conf you'll find Without you even realizing it, apt created this new user www-data and set up Apache to use it for child processes. Setup A Website apt-get update
apt-get upgrade
apt-get install apache2
/etc/init.d/apache2 start
/etc/init.d/apache2 stop
/etc/init.d/apache2 reload
Listen
specified in the configuration file is default of 80 (or any other port below 1024), then it is necessary to have root privileges in order to start apache, so that it can bind to this privileged port. Once the server has started and performed a few preliminary activities such as opening its log files, it will launch several child processes which do the work of listening for and answering requests from clients. The main httpd
process continues to run as the root user, but the child processes run as a less privileged user. User
directive to serve hits. # These need to be set in /etc/apache2/envvars
User ${APACHE_RUN_USER}
Group ${APACHE_RUN_GROUP}
export APACHE_RUN_USER=www-data
export APACHE_RUN_GROUP=www-data
mkdir /mnt/data-store/sites
mkdir /mnt/data-store/sites/usgin
mkdir /mnt/data-store/sites/usgin/logs
mkdir /mnt/data-store/sites/usgin/www
chown root:adm /mnt/data-store/sites/usgin/logs
chmod 0750 /mnt/data-store/sites/usgin/logs
chmod 0755 /mnt/data-store/sites/usgin/www
cp /var/www/index.html /mnt/data-store/sites/usgin/www
a2ensite usgin
a2dissite default
/etc/init.d/apache2 reload
Preparation: Updates and Upgrades
apt-get update
apt-get upgrade
These commands update your apt system with what is available in the repositories, and upgrade any packages already installed to which upgrades are available. This seems like a good thing to do on a regular basis, or at least before any software installations.
Install Sun's Java 6 JDK
You could skip this step and jump down to Installing Tomcat 6.x, but the trouble is that the tomcat6 package by default installs an Open-Source JDK instead of that published by Sun Microsystems. This is all fine and good for Tomcat, but GeoServer does not work with the OpenJDK. Since we'll be wanting to use GeoServer on the machine, we need to install Sun's JDK. Fortunately, it's as easy as anything:
apt-get install sun-java6-jdk
* Note (7/28/2010) -- Ubuntu 10 encourages you to use the OpenJDK by not allowing you to easily install the Sun JDK. See http://www.clickonf5.org/linux/how-install-sun-java-ubuntu-1004-lts/7777 for instructions on how to install the Sun JDK under these circumstances.
Installing Tomcat 6.x
apt-get install tomcat6
It's just that easy. I also installed the admin package which gives us the tomcat manager and host-manager.
apt-get install tomcat6-admin
You may also want to install the documentation.
apt-get install tomcat6-docs
Starting and Stopping Tomcat
Use the following commands to start, stop and restart Tomcat
/etc/init.d/tomcat6 start
/etc/init.d/tomcat6 stop
/etc/init.d/tomcat6 restart
These commands should be run with root privileges. The script you're running contains a command specifying the user that should end up running Tomcat itself. This user is called "tomcat6", is unprivileged, and was created when you installed the Tomcat package.
Configuring Tomcat 6.x
First, we need to define an admin user who can access the admin webapps that we installed. Open the file /etc/tomcat6/tomcat-users.xml and add the bold line:
<tomcat-users> <!-- <role rolename="tomcat"/> <role rolename="role1"/> <user username="tomcat" password="tomcat" roles="tomcat"/> <user username="both" password="tomcat" roles="tomcat,role1"/> <user username="role1" password="tomcat" roles="role1"/> --> <user username="AdminUserName" password="AdminUserPassword" roles="admin,manager"/> </tomcat-users>
At this point, restart Tomcat using the commands listed above. You can check that it is working by pointing your web browser to http://<Elastic IP Address>:8080. You should see a simple "It Works!" page. You can also point your browser to http://<Elastic IP Address>:8080/manager/html, enter the AdminUserName and AdminUserPassword that you used in the /etc/tomcat6/tomcat-users.xml file.
The next thing to configure is logging. We would like log files to be placed on the Elastic Data-store so that they can be read in the event that the instance crashes and burns. For our purposes, Tomcat logs will reside in /mnt/data-store/tomcat/logs. First we will adjust the paths to the log files in /etc/tomcat6/logging.properties.
Change:
1catalina.org.apache.juli.FileHandler.level = FINE 1catalina.org.apache.juli.FileHandler.directory = ${catalina.base}/logs 1catalina.org.apache.juli.FileHandler.prefix = catalina. 2localhost.org.apache.juli.FileHandler.level = FINE 2localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs 2localhost.org.apache.juli.FileHandler.prefix = localhost.
to:
1catalina.org.apache.juli.FileHandler.level = FINE 1catalina.org.apache.juli.FileHandler.directory = /mnt/data-store/tomcat/logs 1catalina.org.apache.juli.FileHandler.prefix = catalina. 2localhost.org.apache.juli.FileHandler.level = FINE 2localhost.org.apache.juli.FileHandler.directory = /mnt/data-store/tomcat/logs 2localhost.org.apache.juli.FileHandler.prefix = localhost.
Then lets add the directories, and set permissions appropriately. I copied the permission and ownership from the default log location.
mkdir /mnt/data-store/tomcat mkdir /mnt/data-store/tomcat/logs chown tomcat6:adm /mnt/data-store/tomcat/logs chmod 0750 /mnt/data-store/tomcat/logs
There also seems to be a problem with the /etc/tomcat6/policy.d/catalina.policy file that prevents any logs from being written. This file specifies what permissions the .jar file that actually does the logging has. Out-of-the-box the way this file is written prevents log files from being written anywhere. We need to fix it, and make sure that it has read/write permissions to the directory where we want the logs to be. The bold lines below are things that had to be changed or added:
// These permissions apply to the logging API grant codeBase "file:${catalina.home}/bin/tomcat-juli.jar" { permission java.util.PropertyPermission "java.util.logging.config.class", "read"; permission java.util.PropertyPermission "java.util.logging.config.file", "read"; permission java.lang.RuntimePermission "shutdownHooks"; permission java.io.FilePermission "${catalina.base}${file.separator}conf${file.separator}logging.properties", "read"; permission java.util.PropertyPermission "catalina.base", "read"; permission java.util.logging.LoggingPermission "control"; permission java.io.FilePermission "/mnt/data-store/tomcat/logs", "read, write"; permission java.io.FilePermission "/mnt/data-store/tomcat/logs/*", "read, write"; permission java.lang.RuntimePermission "getClassLoader"; permission java.lang.RuntimePermission "setContextClassLoader"; // To enable per context logging configuration, permit read access to the appropriate file. // Be sure that the logging configuration is secure before enabling such access // eg for the examples web application: // permission java.io.FilePermission "${catalina.base}${file.separator}webapps${file.separator}examples${file.separator}WEB-INF${file.separator}classes${file.separator}logging.properties", "read"; };
Adjusting Java Memory Allocation
In order for servlets like GeoNetwork and GeoServer to run smoothly, you'll often need to make some adjustments to the memory allocation of the java instance that runs Tomcat. You can do this in a whole bunch of different places, since "Starting Up Tomcat" really means running a whole string of scripts. I made the adjustment by editing /etc/default/tomcat6.
Uncomment and edit the following line by adding what I've put in bold:
JAVA_OPTS="-Djava.awt.headless=true -Xms256M -Xmx1024M -XX:MaxPermSize=256m -XX:PermSize=128m"
I did this on a windows machine following the instructions I laid out in this post. However, when installing Apache on Ubuntu using the apt system, you end up with a pretty strikingly different Apache configuration than I was used to. I found an incredibly useful walkthrough written by Robert Peters that I'll basically re-write here. Simply remove the <!-- before and the --> after the line to uncomment it.apt-get install libapache2-mod-jk
#Define 1 real worker using ajp13
worker.list=worker1
#Set properties for worker1 (ajp13)
worker.worker1.type=ajp13
worker.worker1.host=localhost
worker.worker1.port=8009
JkWorkersFile /etc/apache2/workers.properties
JkLogFile /var/log/apache2/mod_jk.log
JkLogLevel info
JkLogStampFormat "[%a %b %d %H:%M:%S %Y]"
JkMount / worker1
JkMount /* worker1
<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />
/etc/init.d/tomcat6 restart
/etc/init.d/apache2 restart
Note: At Ubuntu 10.04 (Lucid), you can install PostgreSQL 8.4 and PostGIS 1.4 using
sudo apt-get install postgresql-8.4-postgis
There are two ways to go about this: The easy way and the hard way. The easy way is to install PostgreSQL 8.3 and PostGIS 1.3. Both of these are out-of-date versions. The hard way installs PostgreSQL 8.4.1 and PostGIS 1.4. I'll outline both ways here. On our machine, I did it the hard way.
PostgreSQL 8.3 and PostGIS 1.3 (The Easy Way)
apt-get install postgresql-8.3-postgis
... and you're done.
PostgreSQL 8.4.1 and PostGIS 1.4 (The Hard Way)
First of all -- this walkthrough benefits enormously from blog posts by Mark Feeney and Javier de la Torre .
apt-get update apt-get upgrade
These commands update your apt system with what is available in the repositories, and upgrade any packages already installed to which upgrades are available. This seems like a good thing to do on a regular basis, or at least before any software installations.
/etc/apt/sources.list is a listing of the repositories used by the apt system. In order to proceed, we need to access some non-standard repositories. Add the following two lines to the file:
deb http://ppa.launchpad.net/pitti/postgresql/ubuntu jaunty main deb-src http://ppa.launchpad.net/pitti/postgresql/ubuntu jaunty main
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 8683D8A2
apt-get update
apt-get install postgresql-8.4
sed -i.bak -e 's/port = 5433/port = 5432/' /etc/postgresql/8.4/main/postgresql.conf
Now stop and restart PostgreSQL
/etc/init.d/postgresql-8.4 stop /etc/init.d/postgresql-8.4 start
apt-get install postgresql-server-dev-8.4 libpq-dev apt-get install libgeos-dev apt-get install proj
wget http://postgis.refractions.net/download/postgis-1.4.0.tar.gz tar xvfz postgis-1.4.0.tar.gz
cd postgis-1.4.0 ./configure make make install
passwd postgres (enter the password at the prompt) su postgres psql -c "ALTER user postgres WITH PASSWORD '[password]'
createdb geodb createlang -d geodb plpgsql psql -d geodb -f /usr/share/postgresql/8.4/contrib/postgis.sql psql -d geodb -f /usr/share/postgresql/8.4/contrib/spatial_ref_sys.sql psql -d geodb -c "SELECT postgis_lib_version();"
If the last command returns "1.4.0" then the template database is properly setup.
Allowing External TCP/IP Connections to PostgreSQL
Having troubles here right now... can get it to work with SSH tunneling though.
Put Data on the Elastic Volume
mkdir /mnt/data-store/postgresql mkdir /mnt/data-store/postgresql/data cp -R /var/lib/postgresql/8.4/main/* /mnt/data-store/postgresql/data chown -R postgres:postgres /mnt/data-store/postgresql/data chmod -R 0700 /mnt/data-store/postgresql/data
/etc/postgresql/8.4/main/postgresql.conf
data_directory = '/mnt/data-store/postgresql/data'
Logging - A lot to learn...
And I haven't done anything about it. No changes have been made to the logging configurations.
Install MySQL 5.1 During installation, specify a password for the root user. I was also prompted to configure Postfix, and selected the "No configuration" option. Allow Remote Access to the MySQL Server First of all, have to tell MySQL to listen to traffic coming from places other than 127.0.0.1. this is done by editing the file at /etc/mysql/my.cnf. Adjust the following line: Allow a Specific User Remote Access MySQL requires that remote access users be specifically appointed. Issue the following command: You'll be prompted for the root MySQL user's password. After entering it, hit enter, and you'll be in the mysql console with a "mysql>" prompt. Enter the following two lines: This means that if user "root" connects from the given IP address using the password you've specified, that user will have full privileges on all databases. apt-get install mysql-server-5.1
bind-address 0.0.0.0 # this was 127.0.0.1
mysql -u root -p
grant all privileges on *.* to 'root'@'[the ip address you'll be connecting from]'
identified by '[password]';
<Context path="/gsvr"
docBase="/mnt/data-store/geoserver/gsvr" debug="0"
reloadable="true" cachingAllowed="false"
allowLinking="true"/>
permission java.security.AllPermission;
/etc/init.d/tomcat6 restart
Setup the MySQL Database for Geonetwork to Use
mysqladmin create geonetwork -u root -p
You will be prompted for the password for the root MySQL user.
mysql -u root -p (enter password when prompted) grant all privileges on geonetwork.* to 'geonetwork'@'localhost' identified by 'password'; grant all privileges on geonetwork.* to 'geonetwork'@'159.87.39.14' identified by 'password';
<AutomatedInstallation langpack="eng"> <com.izforge.izpack.panels.HelloPanel/> <com.izforge.izpack.panels.HTMLLicencePanel/> <com.izforge.izpack.panels.TargetPanel> <installpath>/mnt/data-store/geonetwork</installpath> </com.izforge.izpack.panels.TargetPanel> <com.izforge.izpack.panels.PacksPanel> <selected> <pack index="0"/> <pack index="1"/> <pack index="2"/> <pack index="3"/> </selected> </com.izforge.izpack.panels.PacksPanel> <com.izforge.izpack.panels.InstallPanel/> <com.izforge.izpack.panels.ShortcutPanel/> <com.izforge.izpack.panels.HTMLInfoPanel/> <com.izforge.izpack.panels.FinishPanel/> </AutomatedInstallation>Note that you can specify the install location. You'll want it to be on the Elastic Data Store somewhere.
mkdir /mnt/data-store/geonetwork cd /mnt/data-store/geonetwork wget http://downloads.sourceforge.net/project/geonetwork/GeoNetwork_opensource/v2.4.2/geonetwork-install-2.4.2-0.jar?use_mirror=softlayer
java -DTRACE=true -jar geonetwork-install-2.4.2-0.jar <path to your install script>
Point GeoNetwork at the MySQL Backend
You'll be editing a file located at /mnt/data-store/geonetwork/web/geonetwork/WEB-INF/config.xml. Find the <resources> node and its children. Make the changes outlined below in bold:
<resources> <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --> <!-- mckoi standalone --> <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --> <resource enabled="false"> <name>main-db</name> <provider>jeeves.resources.dbms.DbmsPool</provider> <config> <user>xRgAPQLl</user> <password>X7ByXqvJ</password> <driver>com.mckoi.JDBCDriver</driver> <url>jdbc:mckoi://localhost:9157/</url> <poolSize>10</poolSize> </config> <activator class="org.fao.geonet.activators.McKoiActivator"><configFile>WEB-INF/db/db.conf</configFile></activator></resource> <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --> <!-- mysql --> <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --> <resource enabled="true"> <name>main-db</name> <provider>jeeves.resources.dbms.DbmsPool</provider> <config> <user>geonetwork</user> <password>password</password> <driver>com.mysql.jdbc.Driver</driver> <url>jdbc:mysql://localhost:3306/geonetwork</url> <poolSize>10</poolSize> <reconnectTime>3600</reconnectTime> </config> </resource>
Adjust GeoServer's Data Directory
GeoNetwork comes with a simple GeoServer installation that is used to draw the basemaps in the Intermap application. The default installation does not point GeoServer at the right place to find its data. Make the following change to/mnt/data-store/geonetwork/web/geoserver/WEB-INF/web.xml:
<context-param> <param-name>GEOSERVER_DATA_DIR</param-name> <param-value>/mnt/data-store/geonetwork/data/geoserver_data</param-value> </context-param>
Adjust GeoNetwork Folder Permissions
There's probably a more elegant way to handle this, but for now....
chown -R tomcat6:tomcat6 /mnt/data-store/geonetwork
Add Context Snippets for Tomcat
In /etc/tomcat6/Catalina/localhost, place three files:
geonetwork.xml
<?xml version="1.0" encoding="UTF-8"?> <!-- configuration to point tomcat at geonetwork directory at root of file system --> <Context docBase="/mnt/data-store/geonetwork242/web/geonetwork" path="/geonetwork"></Context>
intermap.xml
<!-- configuration to point tomcat at intermap (map on the web interface for geonetwork; geoserver map client) directory at root of file system --> <Context docBase="/mnt/data-store/geonetwork242/web/intermap" path="/intermap"></Context>
geoserver.xml
<?xml version="1.0" encoding="UTF-8"?><!-- configuration to point tomcat at geoserver (WMS service etc.) directory at root of file system --> <Context docBase="/mnt/data-store/geonetwork242/web/geoserver" path="/geoserver"></Context>
Restart Tomcat
/etc/init.d/tomcat6 restart
This section contains tips and helpful hints for using the USGIN EC2 Instance.
"Restarting" the Virtual Machine
You can't actually restart it, instead you essentially delete it and roll-back to a prior machine image.
Connect to the Instance Using SSH (and PuTTY)
If you already have the .ppk file for the appropriate user that you wish to log in with, use it. You may want to set up SSH Tunneling if you want to use pgAdmin to connect to PostgreSQL.
If you don't have a.ppk file, follow the instructions outlined in this post.
Mounting the Elastic Block Store
First, use the EC2 Console to attach the volume to the instance at /dev/sdh
mkdir /mnt/data-store mount /dev/sdf /mnt/data-store
Starting, Stoping, Restarting Applications
These commands should be executed with root privileges
/etc/init.d/apache2 start /etc/init.d/apache2 stop /etc/init.d/apache2 reload
/etc/init.d/tomcat6 start /etc/init.d/tomcat6 stop /etc/init.d/tomcat6 restart
/etc/init.d/postgresql-8.4 start /etc/init.d/postgresql-8.4 stop /etc/init.d/postgresql-8.4 restart
/etc/init.d/mysql start /etc/init.d/mysql stop /etc/init.d/mysql restart
User Administration