CS180 - Database Systems
Fall 2001, Prof. Arthur Keller

Project Part #3
Due midnight at the end of Thursday January 31, 2002

LOGISTICS AND LATE POLICY REMINDER

THE PROJECT

Personal Database Application, Part 3

In this part of the project, you will create a relational schema for your PDA in the PostgreSQL database system, and you will populate the tables in your database with initial data sets.

(a) Familiarize yourself with the PostgreSQL relational DBMS by reading the document Managing a Database in the Interactive PostgreSQL documentation, logging into PostgreSQL, trying some of the examples in the document, and experimenting with the various commands. You don't need to turn anything in for this part. The PostgreSQL web page has links to documentation and other information. See PostgreSQL: Introduction and Concepts, by Bruce Momjian, which is available in printed form at the bookstore and on reserve at the library. There's also a PostgreSQL At A Glance document describing PostgreSQL's features.

Create your first database using the shell command:

createdb username

where "username" is your cats login username.

Then you can access your database with the command:

psql -a
which will start a command line interface to PostgreSQL connected to the database with your username. (The -a is so input is echoed to the console for saving a script log. If you don't want your input echoed, you can omit the -a, but please do include it when you are running an execution script to hand in.)

You may create other database names, but they must all start with your username. (By the way, username "wan" should not create databases starting with "wang"!)

To start a command line interface to PostgreSQL connecting to a different database, use the command:

psql -a databasename

where "databasename" is the name of your database (which should begin with your cats username).

(b) Create relations for your PDA based on your final relational schema from PDA Part 2. Use the CREATE TABLE command to specify each relation its attributes and attribute types; see CREATE TABLE from the PostgreSQL interactive documentation. Many of the attribute types supported by PostgreSQL are listed on page 286 of the textbook. If you have an attribute that represents a date and/or time, you may want to look at the page on FAQ: Working with Dates and Times in PostgreSQL.

Turn in a script log showing an PostgreSQL session in which your relations are created successfully. Please see Recording Your Session below for details.

(c) For each relation in your PDA, create a execution script file containing a Copy Command and a few (approximately 5-10) records of "realistic" data. Then execute the script file from the psql command line using the \i option. Please see Creating execution script files below for more details.

Turn in a listing showing the contents of the files you created, the successful loading of the data into PostgreSQL, and the execution of "SELECT *" commands to show the contents of each relation.

(d) Write a program in any programming language you like that creates large files of records for each of your PDA relations. If you have available real data for your PDA, then your program will need to transform the data into files of records conforming to your PDA schema and to PostgreSQL's load format. The rest of you will need to write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values - strings especially - to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than the small "toy" databases often used in classes. The data you generate and load should be on the order of:

If your application naturally includes relations that are expected to be relatively small (e.g., schools within a university), then it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:

  1. Make sure not to generate duplicate values for key attributes.

  2. Your PDA almost certainly includes relations that are expected to join with each other. For example, you may have a Student relation with attribute courseNum that's expected to join with attribute number in relation Course. When generating data, be sure to generate values that actually do join - otherwise all of your interesting queries will have empty results! There are a couple of ways to properly generate joining values. One way is to generate records for multiple relations (e.g., Course and Student) at the same time. Another way is to generate the records for one relation first, and then use the joining values for the other relation. For example, you could generate records for relation Course first, then use the Course.number values when creating values for Student.courseNum.

Turn in your program code for generating or transforming data, a small sample of the records generated for each relation (5 or so records per relation), and a script log showing the successful loading of your data into PostgreSQL.

What to turn in

Please see Recording Your Session below for a guide to preparing output to be submitted for this and subsequent project parts.

Components (b), (c), and (d) of this project part each tell you what should be recorded in the script log that you turn in. In this and all subsequent project parts, the material you turn in should be clearly formatted and delineated, and should include comments for any aspects that are not crystal clear. Poorly assembled or documented material will not receive full credit, even if it is correct. You also will not receive full credit if you turn in your entire large data files (or large query results in later assignments) when we ask for small samples. Other than comments, truncation, and simple formatting, it is Academic Dishonesty to edit scripts before turning them in.

For this assignment the following files should be turned in electronically using your cats account and the submit program:
README Please give your name, project part number, course number, date, the title of your project, a list and description of the files you are submitting, and any other information that will be useful for the grader.
create.log A record of your session creating your tables.
data.script The execution script file(s) you used to populate your small database. If you used a separate file for each table use the table name to begin the file name: like product_data.script (for relation product)
data.log script log (i.e., a recording) of your session using data.script to populate your DB and your execution of SELECT queries on each table.
generate.x The program code to generate your large data (x = c, or java or perl...)
bigdata.script Snippets from each of the the The script file(s) produced by your generate program and used to populate your big database. If you used a separate file for each table use the table name to begin the file name: like product_bigdata.script (for relation product)
bigdata.log-part Representative portions of the record of your use of script files to populate your big data base. Do not include the entire script log file. It will waste space and annoy the grader. Instead include only the commands and a few records for each relation as well as the psql messages.

To submit these files (assuming you used java to generate your data) use the syntax:

  submit cmps180-ark.w02 proj3 create.log  data.script data.log generate.java bigdata.script bigdata.log-part

or proj3-late if your project part is late.

Maintaining your databases

You will be using both your small (part c) and large (part d) databases for the rest of the course. The idea is to use the small database to experiment on meaningful-looking data, and the large one to experiment on data of more realistic size. We suggest that you keep the two databases going in parallel by creating two sets of files and two sets of relations with slightly different names.

For the duration of the project, we suggest that you establish some kind of routine that includes reloading your database from the files created in this project part each time you want to get a "fresh" start with PostgreSQL. Remember to delete the contents of each relation (or destroy and recreate the relations) before reloading. Otherwise, unless there is a declared key (or you take APPEND out of your control file), PostgreSQL will happily append the new data to your old relation, causing your relation size to double, triple, quadruple, etc. To get rid of a table called T, issue the command:

  drop table T;
If you want to get rid of all tuples in T without deleting the table itself, issue the command:
  delete from T;

Recording Your Session

There are several methods for creating a typescript to turn in for your programming assignments. The most primitive way is to cut and paste your terminal output and save it in a file (if you have windowing capabilities). Another method is to use the Unix command script to record the terminal interaction. The script command records everything printed on your screen. The syntax for the command is
    script [ -a ] [ filename ]
The record is written to filename. If no file name is given, the record is saved in the file typescript. The -a option allows you to append the session record to filename, rather than overwrite it. To end the recording, type
    exit
For more information about script, check out its man page.

Creating and running script files from psql

You will be using the psql command line interface to interact with your data base. See Managing a Database for information on starting using psql. Script files are text files of psql commands which can be executed like a batch file using the the \i command in psql. The syntax is \i filename entered following the psql prompt, where file name is the complete (case sensitive) name of the script file you desire to run.

To run your execution script data.script and save the script log in the file data.log, do the following:
script data.log to start saving the script log
psql -a databasename to run PostgreSQL's command line inteface using your database
\i data.script to import an execution script
\q to exit pqsl (PostgreSQL's command line interface)
exit to stop saving the script log

The execution script file you create can consist of most any series of commands which you could enter following the psql prompt. This includes all of the SQL commands which you will be using to create, modify and test your data base. Examples include the CREATE TABLE and SELECT commands. Just like when using the psql command line interface you must terminate each SQL command in your script file with a semicolon. \i and \q do not need semicolons.

If you are recording your session using the script command described above into a script log then it is useful to start psql using the -a option so that all commands included in your execution script file will be echoed to the console and thus to your script log file. Here is an example script file that creates a table (relation) named products:

CREATE TABLE products ( 
	productID INT, 
	name VARCHAR(80), 
	price NUMERIC(10,2), 
	retailPrice NUMERIC(10,2) 
); 
Here is an example script file that loads four tuples into the table (relation) named products created using the previous script file:
COPY products FROM stdin USING DELIMITERS '|'; 
1419|American Greetings CreataCard Gold V4.0|21.49|25.24
1424|Barbie(R) Nail Designer(TM)|20.74|25.99
1427|Panzer Commander|21.99|30.24
1431|Riven: The Sequel to Myst|31.99|40.24
\.
This is the format you will use for the files that load data into your tables. The USING DELIMITERS '|' and the use of '|' as a delimiter is optional. The default delimiter is the tab character. The delimited data on each line must match the attributes and their types in your table in a one to one manner and in the order they were defined in your CREATE TABLE commands. The COPY data must be terminated with '\.'

For testing, some students have found it convenient to have a separate script file to populate each of the tables. I also find it convenient have a single script file to create all of my tables and another one to drop all of my tables. Examples follow.

Sample Script Files

createbeers.script
-- Sample Script file to Create and 
-- Populate a BEERS DB  
 
-- print out the current time
SELECT timeofday();

CREATE TABLE Beers (
name VARCHAR(30),
manf VARCHAR(50)
);
COPY Beers FROM stdin USING DELIMITERS '|';
Coors|Adolph Coors
Coors Lite|Adolph Coors
Miller|Miller Brewing
Miller Lite|Miller Brewing
MGD|Miller Brewing
Bud|Anheuser-Busch
Bud Lite|Anheuser-Busch
Michelob|Anheuser-Busch
Anchor Steam|Anchor Brewing
\.
CREATE TABLE Bars (
name VARCHAR(30),
addr VARCHAR(50),
license VARCHAR(50)
);
COPY Bars FROM stdin USING DELIMITERS '|';
Joe's|123 Any Street|B7462A
Sue's|456 My Way|C5473S
\.
CREATE TABLE Sells (
bar VARCHAR(20),
beer VARCHAR(30),
price REAL
);
COPY Sells FROM stdin USING DELIMITERS '|';
Joe's|Coors|2.50
Joe's|Bud|2.50
Joe's|Bud Lite|2.50
Joe's|Michelob|2.50
Joe's|Anchor Steam|3.50
Sue's|Coors|2.00
Sue's|Miller|2.00
\.
CREATE TABLE Drinkers (
name VARCHAR(30),
addr VARCHAR(50),
phone CHAR(16)
);
COPY Drinkers FROM stdin USING DELIMITERS '|';
Bill Jones|180 Saint St.|831-459-1812
Kelly Arthur|180 Alto Pl.|650-856-2002
Fred|1234 Fifth St.|831-426-1956
\.
CREATE TABLE Likes (
drinker VARCHAR(30),
beer VARCHAR(30)
);
COPY Likes FROM stdin USING DELIMITERS '|';
Bill Jones|Miller
Bill Jones|Michelob
Kelly Arthur|Anchor Steam
Fred|MGD
\.
CREATE TABLE Frequents (
drinker VARCHAR(30),
bar VARCHAR(30)
);
COPY Frequents FROM stdin USING DELIMITERS '|';
Bill Jones|Joe's
Bill Jones|Sue's
Kelly Arthur|Joe's
\.

-- Execute some SELECT queries--
SELECT * FROM Bars;
SELECT * FROM Drinkers;

-- print out the current time
SELECT timeofday();

dropbeers.script
DROP TABLE Beers;
DROP TABLE Bars;
DROP TABLE Sells;
DROP TABLE Likes;
DROP TABLE Frequents;
DROP TABLE Drinkers;