Aims

This exercise aims to get you to:

set up your directories on the /srvr file system
install a PostgreSQL database server on /srvr

You ought to get it done before the end of Week 2.

Background

Notation: In the examples below, we have used the $ sign to represent the prompt from the Unix/Linux shell. The actual prompt may look quite different on your computer (e.g. it may contain the computer's hostname, or your username, or the current directory name). In the example interactions, all of the things that the computer displays are in this font. The commands that you are supposed to type are in this bold font. Comments in the examples are introduced by '...' and are written in this grey font; comments do not appear on the computer screen, they simply aim to explain what is happening. Whenever we use the word edit, this means that you should use your favourite text editor (e.g. vi, emacs, gedit, etc.) Finally, all references to YOU should be replaced by your CSE username, (e.g. I would use jas everywhere that YOU is used).

PostgreSQL has three major components:

the source code (and the compiled *.o files) (approx 150MB)
the installed executables (like pg_ctl and psql) (approx 20MB)
the data (including configuration files and databases) (at least 35MB)

You will not be able to fit the above components under your CSE home directory (insufficient disk quota), so what we have arranged is for you to have an additional directory (folder) called /srvr/YOU with enough space to hold all of the above. You can access this directory via the command:

$ cd /srvr/YOU

You must put your PostgreSQL source code and installed executables under /srvr/YOU. The data can be located either under /srvr/YOU or under the /tmp filesystem on the machine where you're working. You can edit, compile and execute PostgreSQL on any workstation within CSE. You can also use the server grieg; you must not use any of the other general-purpose servers (such as wagner, weill, etc.) for running PostgreSQL servers. You will need to configure things slightly differently depending on where you run PostgreSQL; how to do this is described below.

If you're doing all of this work on a laptop or PC at home, then you can configure things however you like. You will still need folders for the same three components (source code, executables, and data), but you can place them whereever you like. PostgreSQL doesn't require any special privleges to run (at least on Unix-based systems like Linux and Mac OSX), so you do not need to create a special privileged PostgreSQL user; you can run the server as yourself.

Setting Up your /srvr Directory (optional)

You should have a directory on /srvr already. If not, the way to create one is to run the following commands from any CSE workstation:

$ ssh grieg
... you are now logged into the computer called "grieg"
$ priv srvr
... create the directory /srvr/YOU
$ exit
... you are now logged off the computer called "grieg"

You should only need to do this once. Once your /srvr/YOU directory exists, repeating the above achieves nothing.

Setting up your PostgreSQL Server

Reminder: all of the commands related to compiling and running and using your PostgreSQL server run fastest on the computer called grieg. The times below are approximate; they could double or triples depending on which machine you use.

Quick summary (for experts only):

Non-experts should go straight to the detailed instructions below.

$ cd /srvr/YOU
$ tar xfj /home/cs9315/web/16s1/postgresql/src.tar.bz2
... creates/populates a directory called postgresql-9.4.6 ...
... produces no output; takes approx 2 minutes ...
$ cd postgresql-9.4.6
$ ./configure --prefix=/srvr/YOU/pgsql
... produces lots of output; takes approx 1 minute ...
$ make
... produces lots of output; takes approx 6 minutes ...
$ make install
... produces lots of output; takes approx 1 minute ...
$ cp /home/cs9315/web/16s1/postgresql/env  /srvr/YOU/env
$ edit /srvr/YOU/env
... select the appropriate PGDATA directory ...
$ source /srvr/YOU/env
$ which initdb
/srvr/YOU/pgsql/bin/initdb
$ initdb
... produces some output; takes approx 1 minute ...
$ ls $PGDATA
... gives a listing of newly-created PostgreSQL data directory ...
... including PG_VERSION, base, global ..., postgresql.conf ...
$ edit $PGDATA/postgresql.conf
... set listen_addresses = '' ...
... set max_connections = 8 ...
... set unix_socket_directories = 'name of PGDATA directory' ...
... if any of the above lines begins with '#', remove the '#'
$ which pg_ctl
/srvr/YOU/pgsql/bin/pg_ctl
$ pg_ctl start -l $PGDATA/log
server starting
$ psql -l
                           List of databases
   Name    | Owner | Encoding  | Collation | Ctype | Access privileges 
-----------+-------+-----------+-----------+-------+-------------------
 postgres  | YOU   | LATIN1    | C         | C     | 
 template0 | YOU   | LATIN1    | C         | C     | =c/YOU
                                                   : YOU=CTc/YOU
 template1 | YOU   | LATIN1    | C         | C     | =c/YOU
                                                   : YOU=CTc/YOU
(3 rows)
$ ... use your PostgreSQL server e.g. create example database ...
$ pg_ctl stop
waiting for server to shut down.... done
server stopped
$

Note that the above times may be less on a home computer where you are accessing its local disk.

Installation Details (for non-experts):

Setting up directories

The first step is to make sure that the directory /srvr/YOU exists. You can check this via the command:

$ ls -l /srvr/YOU

If the above command says something like "No such file or directory", then you should create it using the instructions above.

Once you have a directory on the /srvr filesystem, the next step is to place a copy of the PostgreSQL source code under this directory. The following commands will do this:

$ cd /srvr/YOU
$ tar xfj /home/cs9315/web/16s1/postgresql/src.tar.bz2

This creates a subdirectory called postgresql-9.4.6 under your /srvr/YOU directory and unpacks all of the source code there. This produces no output and will take around 2 minutes to complete. If you want to watch as tar unpacks the files, use xvfj instead of xfj as the first argument to tar.

Initial compilation

Once you've unpacked the source code, you should change into the newly created postgresql-9.4.6 directory and configure the system so that it uses the directory /srvr/YOU/pgsql to hold the executables for your PostgreSQL server. (Note that /srvr/YOU/pgsql does not exist yet; it will be created in the make install step). The following commands will do the source code configuration:

$ cd /srvr/YOU/postgresql-9.4.6
$ ./configure --prefix=/srvr/YOU/pgsql

The configure command will print lots of messages about checking for various libraries/modules/etc. This process will take around 1 minute, and should produce no errors.

Once you have configured the source code, the next step is to build all of the programs:

$ cd /srvr/YOU/postgresql-9.4.6
$ make

This compiles all of the PostgreSQL source code, and takes around 6-7 minutes (depending on the load on grieg). It will produce lots of output, but should compile everything ok and end with the message:

All of PostgreSQL successfully made. Ready to install.

Installing executables

Once the PostgreSQL programs are compiled, you need to install them. The following command does this:

$ cd /srvr/YOU/postgresql-9.4.6
$ make install

This creates the directory /srvr/YOU/pgsql and copies all of the executables (such as pg_ctl and psql) under that directory. It will take 1-2 minutes to do this, and will produce quite a bit of output while it's doing it. Ultimately, it should end with the message:

PostgreSQL installation complete.

Data directories

You're not finished yet, however, since PostgreSQL has no directory in which to store all of its data. There are two possibilities in how to proceed at this stage:

you could install the data directories under /srvr/YOU/pgsql, which has the advantage that you can leave them there permanently, but has the disadvantage that building databases will be relatively slow
you could install the data directories under /tmp, which has the advantage that it's much faster to manipulate databases, but has the disadvantage that you'll: (a) need to re-create the data directories each time you want to use PostgreSQL and (b) ensure that you stop the server and remove the data before you log out

We discuss both possibilities below.

Before doing anything with the database, however, you need to ensure that your Unix environment is set up correctly. We have written a small script called env that will do this. In this set up stage, you should copy this script to your /srvr directory:

$ cp   /home/cs9315/web/16s1/postgresql/env   /srvr/YOU

The env script contains the following:

PGHOME=/srvr/$USER/pgsql
export PGDATA=/tmp/pgsql.$USER
export PGDATA=$PGHOME/data
export PGHOST=$PGDATA
export PGPORT=5432
export LD_LIBRARY_PATH=$PGHOME/lib
export PATH=$PGHOME/bin:/home/cs9315/bin:$PATH

THis script sets up a number of environment variables. The critical ones are:

PGDATA : which tells the PostgreSQL server where it's data directories are located
PGHOST : which tells PostgreSQL clients where are the socket files to connect to the server

Note that there are two definitions for PGDATA. The second one is the default and will use data directories under /srvr/YOU/pgsql. If you want to put the data directories under /tmp instead, simply swap the two export PGDATA=... lines.

What's the difference between the two ways of setting up the data directory? ...

If you use /tmp for the data, you will need to create the data directories and edit the PostgreSQL configuration file each time you have a session with PostgreSQL. It is also essential that you stop the server and remove the data directories at the end of your session in this case. If you use /srvr for the data, it will persist between your login sessions, so you have less setup each time but all of your interaction with the database will be slower.

Note that in the discussion below, we will use the string YOUR_PGDATA to refer to that value that you assigned to PGDATA in your env file and which has been set by source'ing the env file in your shell.

The precise combination of values in the env file depends on where you are running the server. Here are the suggested configurations:

Running server on grieg

You can put the data directories on either /srvr or /tmp. However, you may need to change the PGPORT value, since the port-space is shared and someone else might already be using port 5432. You will detect this when you try to run the server and it fails to start (check the /srvr/YOU/pgsql/log file if your server will not start).

Running server on CSE lab workstation

You will need to put the data directories under /tmp. You may need to change the PGPORT value, if some anti-social COMP9315 student had left their PostgreSQL server running on your workstation and was using port 5432.

Initialising data directories and running server

Once you have a copy of the env script and have set the values appropriately, you need to invoke it in every shell window where you plan to interact with the database. You can do this by explicitly running the following command in each window:

$ source /srvr/YOU/env

If that gets tedious, you might consider adding the above command to your .bash_profile script.

Once you've set up the environment, check that it's ok via the following commands:

$ echo $PGHOME
/srvr/YOU/pgsql
$ echo $PGDATA
YOUR_PGDATA ... i.e. whatever value you set it to ...
$ which initdb
/srvr/YOU/pgsql/bin/initdb
$ which pg_ctl
/srvr/YOU/pgsql/bin/pg_ctl

If the system gives you different path names to the above, then your environment is not yet set up properly. Are you sure that you source'd your env file?

If all of the above went as expected, you are now ready to create the data directories and run the server. You can do this via the command:

$ initdb
... some output eventually finishing with something like ...
Success. You can now start the database server using:

    postgres -D YOUR_PGDATA
or
    pg_ctl -D YOUR_PGDATA -l logfile start

If you look at your data directory now, you should see something like:

$ ls $PGDATA
PG_VERSION  pg_clog        pg_multixact  pg_stat_tmp  pg_twophase
base        pg_hba.conf    pg_notify     pg_subtrans  pg_xlog
global      pg_ident.conf  pg_serial     pg_tblspc    postgresql.conf

You shouldn't start the server straight away, however, since there's one more bit of configuration needed. You need to edit the postgresql.conf file in the $PGDATA directory and change the values of the following:

change the value of the listen_addresses parameter
reduce the value of max_connections (saves resources)
set the value of the unix_socket_directories parameter

Once you're done, the modified part of the postgresql.conf file should look like (with the changes highlighted in red):

#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------

# - Connection Settings -

listen_addresses = ''		# what IP address(es) to listen on;
					# comma-separated list of addresses;
					# defaults to 'localhost', '*' = all
					# (change requires restart)
#port = 5432				#(change requires restart)
max_connections = 8			# (change requires restart)
# Note:  Increasing max_connections costs ~400 bytes of shared memory per 
# connection slot, plus lock space (see max_locks_per_transaction).
#superuser_reserved_connections = 3	# (change requires restart)
unix_socket_directory = 'YOUR_PGDATA'
#unix_socket_group = ''			# (change requires restart)
#unix_socket_permissions = 0777		# begin with 0 to use octal notation
					# (change requires restart)

Note that it doesn't matter that the file says port=5432. This value will be overidden by whatever you set your PGPORT environment variable to.

Note also that the 5432 also doesn't matter because the # at the start of the line means that it's a comment. In the case of the lines that you are supposed to change, make sure that you remove the # from the start of those lines.

Everything is now ready to start your PostgreSQL server, which you can do via the command:

$ pg_ctl start -l $PGDATA/log

Note that PostgreSQL says "server starting", whereas it should probably say "attempting to start server". It is possible that the server may not start correctly. If the server does not appear to have started, you can check why by looking at the tail of the server log:

$ tail -20 $PGDATA/log
... information about what happened at server start-time ...

Note that you'll get error messages about not being able to run the statistics collector, and a warning that autovacuum was not started. These are not an issue at this stage.

A quick way to check whether the server is working is to run the command:

$ psql -l
                           List of databases
   Name    | Owner | Encoding  | Collate | Ctype | Access privileges 
-----------+-------+-----------+-----------+-------+-------------------
 postgres  | YOU   | LATIN1    | C       | en_AU | 
 template0 | YOU   | LATIN1    | C       | en_AU | =c/YOU
                                                 | YOU=CTc/YOU
 template1 | YOU   | LATIN1    | C       | en_AU | =c/YOU
                                                 | YOU=CTc/YOU
(3 rows)

which will give you a list of databases like the above if the server is running. If the server is not running, you'll get a message something like:

psql: could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "YOUR_PGDATA/.s.PGSQL.5432"?

If this happens, you should check the log file to find out what went wrong. (Other things to check in case of problems are described below).

Assuming that the server is running ok, you can now use it to create and manipulate databases (see the example below). Once you've finished your session using PostgreSQL, you need to stop the server.

$ pg_ctl stop
waiting for server to shut down.... done

If you still have a process that's using the database (e.g. a psql process in another window), then the server won't be able to shut down. You'll need to quit all of the processes that are accessing the database before the above command will work.

If you put your data under /tmp, you must also remove the data directories. You can do this via the command:

$ rm -r /tmp/pgsql.YOU

The `pgs` script

Since the above process is rather fiddly, we have provided a script that provides a single command to setup your data directory (if needed) and start your server. It still requires you to set the values in your env file appropriately, however. The script is called pgs and is located in the directory /home/cs9315/bin.

The pgs script is designed to help you manage your PostgreSQL servers and do a bit of error checking along the way to see if everything is ok. It has four possible arguments:

setup : create a new PGDATA directory (complains if one already exists)
cleanup : remove the PGDATA directory (make sure you backup anything important before doing this)
start : start your PostgreSQL server (waiting until it actually starts ok)
stop : stop your PostgreSQL server (waiting until it actually stops ok)

The pgs script is just a wrapper around two of the PostgreSQL commands mentioned above:

initdb : sets up the PGDATA directory
pg_ctl : controls the operation of the PostgreSQL server

As noted above, the pgs script has four modes of operation:

setting up the data directory:

If you leave your data under /srvr/YOU/pgsql, then you only need to do this once. If your data is on /tmp, you will need to do this each time you want to have a session using PostgreSQL.

$ pgs setup Using PostgreSQL with data directory /your/PGDATA/directory The files belonging to this database system will be owned by user "YOU". This user must also own the server process. Running this command should eventually produce the output:

Success. You can now start the database server using:

    postgres -D YOUR_PGDATA
or
    pg_ctl -D YOUR_PGDATA -l logfile start

After doing the above, your PostgreSQL server is ready to start and use.

starting the PostgreSQL server:

$ pgs start
Using PostgreSQL with data directory YOUR_PGDATA
waiting for server to start...... done
server started
Check whether the server started ok via the command 'psql -l'.
If it's not working, check YOUR_PGDATA/log for details.

If the "waiting for server to start" is followed by an ever-growing sequence of dots, it means that the server is not starting properly. You'll need to do some additional debugging (see below) for such cases.

stopping the PostgreSQL server:

The following command stops the PostgreSQL server:
```
$ pgs stop
Using PostgreSQL with data directory YOUR_PGDATA
waiting for server to shut down.... done
```
If you get an ever-growing sequence of dots, it means that the server cannot shut down. This is typically caused by some other process being connected to your PostgreSQL server (e.g. a psql process running in another window).
cleaning (removing) the data directory:

You only need to do this if you are not keeping your databases between sessions with PostgreSQL, i.e. because you have put the data directory under /tmp.
```
$ pgs cleanup
Using PostgreSQL with data directory YOUR_PGDATA
This will remove all files under YOUR_PGDATA
Do you want to continue? y
```
If you decide that you really don't want to remove the data directories, typing anything other than y or yes will not do the cleanup. If you accidentally remove your data directory, it is easy enough to restore using pgs setup.

A Typical session with PostgreSQL

Once you've got your PostgreSQL server installed, this is what you'd normally do to use it:

$ source /srvr/YOU/env
$ pgs setup
... BUT ONLY if your PGDATA directory is on /tmp ...
$ pgs start
... hopefully concluding with the message ...
server started
$ psql -l
... hopefully giving a list of databases ...
$ createdb myNewDB
$ psql myNewDB
... do stuff with your database ... 
$ pgs stop
... hopefully concluding with the message ...
server stopped
$ pgsq cleanup
... BUT ONLY if your PGDATA directory is on /tmp ...

Reminder

You must shut down your server at the end of each session with PostgreSQL if you're working on the CSE workstations. Failure to do this means that the next student who uses that workstation may need to adjust their configuration (after first working out what the problem is) in order to start their server.

A Sample Database

Once your server is up-and-running, you ought to load up the small sample database (on beer) and try a few queries on its data. This is especially important if you haven't used PostgreSQL before; you need to get used to its interactive interface.

You can set up the beer database as follows:

$ createdb beer
$ psql beer -f  /home/cs9315/web/16s1/pracs/p01/beer.dump
... around 20 lines include SET, CREATE TABLE, ALTER TABLE...
$ psql beer
psql (9.4.6)
Type "help" for help.

beer=# select count(*) from beers;
 count 
-------
    24
(1 row)

beer=# \d
... gives a list of tables in the database ...
beer=#
... explore/manipulate the database ...
beer=# \q
$

For exploring the database with psql, there are a collection of \d commands. You can find out more about these via psql's \? command or by reading the PostgreSQL manual