Remember that a completely separate deadline and late policy applies for this and subsequent project parts:
For all students: Programming work must be submitted electronically by midnight at the end of the day that it is due. Programming work submitted after the deadline but before the "late deadline" -- midnight 48 hours after the deadline -- will be accepted but penalized 50%, except for the first two late project parts handed in late and the first two homework assignments handed in late. No programming work will be accepted after the late deadline.
One caveat: students must use ssh (both versions 1 and 2 will work), as telnet is not enabled. SSH is an encrypted (safer) method of connecting to a remote host, as opposed to telnet which transmits and receives data unencrypted. Free and legitimate SSH software is available for almost all computing platforms.
Once connected to linux.ic.ucsc.edu, students will have full access to their CATS home directories via AFS, much the same way as when logged onto teach.ic.ucsc.edu and hawking.ic.ucsc.edu. The major different is that the binaries (including those for PostgreSQL) are Linux binaries, and thus will not work on the Sun hosts such as teach/learn/hawking/curie.
(a) Familiarize yourself with the PostgreSQL relational DBMS by reading the document Managing a Database in the Interactive PostgreSQL documentation, logging into PostgreSQL, trying some of the examples in the document, and experimenting with the various commands. You don't need to turn anything in for this part. The PostgreSQL web page has links to documentation and other information. See PostgreSQL: Introduction and Concepts, by Bruce Momjian, which is available in printed form at the bookstore and on reserve at the library. There's also a PostgreSQL At A Glance document describing PostgreSQL's features.
Create your first database using the shell command:
createdb username
where "username" is your cats login username.
Then you can access your database with the command:
psql -awhich will start a command line interface to PostgreSQL connected to the database with your username. (The -a is so input is echoed to the console for saving a script log. If you don't want your input echoed, you can omit the -a, but please do include it when you are running an execution script to hand in.)
You may create other database names, but they must all start with your username. (By the way, username "wan" should not create databases starting with "wang"!)
To start a command line interface to PostgreSQL connecting to a different database, use the command:
psql -a databasename
where "databasename" is the name of your database (which should begin with your cats username).
(b) Create relations for your PDA based on your final relational schema from PDA Part 2. Use the CREATE TABLE command to specify each relation its attributes and attribute types; see CREATE TABLE from the PostgreSQL interactive documentation. Many of the attribute types supported by PostgreSQL are listed on page 286 of the textbook. If you have an attribute that represents a date and/or time, you may want to look at the page on FAQ: Working with Dates and Times in PostgreSQL.
Turn in a script log showing an PostgreSQL session in which your relations are created successfully. Please see Recording Your Session below for details.
(c) For each relation in your PDA, create a execution script file containing a Copy Command and a few (approximately 5-10) records of "realistic" data. Then execute the script file from the psql command line using the \i option. Please see Creating execution script files below for more details.
Turn in a listing showing the contents of the files you created, the successful loading of the data into PostgreSQL, and the execution of "SELECT *" commands to show the contents of each relation.
(d) Write a program in any programming language you like that creates large files of records for each of your PDA relations. If you have available real data for your PDA, then your program will need to transform the data into files of records conforming to your PDA schema and to PostgreSQL's load format. The rest of you will need to write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values - strings especially - to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than the small "toy" databases often used in classes. The data you generate and load should be on the order of:
If your application naturally includes relations that are expected to be relatively small (e.g., schools within a university), then it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:
Turn in your program code for generating or transforming data, a small sample of the records generated for each relation (5 or so records per relation), and a script log showing the successful loading of your data into PostgreSQL.
Components (b), (c), and (d) of this project part each tell you what should be recorded in the script log that you turn in. In this and all subsequent project parts, the material you turn in should be clearly formatted and delineated, and should include comments for any aspects that are not crystal clear. Poorly assembled or documented material will not receive full credit, even if it is correct. You also will not receive full credit if you turn in your entire large data files (or large query results in later assignments) when we ask for small samples. Other than comments, truncation, and simple formatting, it is Academic Dishonesty to edit scripts before turning them in.
For this assignment the following files should be turned in electronically using your cats account and the submit program:
| README | Please give your name, project part number, course number, date, the title of your project, a list and description of the files you are submitting, and any other information that will be useful for the grader. |
| create.log | A record of your session creating your tables. |
| data.script | The execution script file(s) you used to populate your small database. If you used a separate file for each table use the table name to begin the file name: like product_data.script (for relation product) |
| data.log | script log (i.e., a recording) of your session using data.script to populate your DB and your execution of SELECT queries on each table. |
| generate.x | The program code to generate your large data (x = c, or java or perl...) |
| bigdata.script | Snippets from each of the the The script file(s) produced by your generate program and used to populate your big database. If you used a separate file for each table use the table name to begin the file name: like product_bigdata.script (for relation product) |
| bigdata.log-part | Representative portions of the record of your use of script files to populate your big data base. Do not include the entire script log file. It will waste space and annoy the grader. Instead include only the commands and a few records for each relation as well as the psql messages. |
To submit these files (assuming you used java to generate your data) use the syntax:
submit cmps180-ark.w02 proj3 create.log data.script data.log generate.java bigdata.script bigdata.log-part
or proj3-late if your project part is late.
For the duration of the project, we suggest that you establish some kind of routine that includes reloading your database from the files created in this project part each time you want to get a "fresh" start with PostgreSQL. Remember to delete the contents of each relation (or destroy and recreate the relations) before reloading. Otherwise, unless there is a declared key (or you take APPEND out of your control file), PostgreSQL will happily append the new data to your old relation, causing your relation size to double, triple, quadruple, etc. To get rid of a table called T, issue the command:
drop table T;If you want to get rid of all tuples in T without deleting the table itself, issue the command:
delete from T;
script [ -a ] [ filename ]The record is written to filename. If no file name is given, the record is saved in the file typescript. The -a option allows you to append the session record to filename, rather than overwrite it. To end the recording, type
exitFor more information about script, check out its man page.
To run your execution script data.script and save the script log in the file data.log, do the following:
| script data.log | to start saving the script log |
| psql -a databasename | to run PostgreSQL's command line inteface using your database |
| \i data.script | to import an execution script |
| \q | to exit pqsl (PostgreSQL's command line interface) |
| exit | to stop saving the script log |
The execution script file you create can consist of most any series of commands which you could enter following the psql prompt. This includes all of the SQL commands which you will be using to create, modify and test your data base. Examples include the CREATE TABLE and SELECT commands. Just like when using the psql command line interface you must terminate each SQL command in your script file with a semicolon. \i and \q do not need semicolons.
If you are recording your session using the script command described above into a script log then it is useful to start psql using the -a option so that all commands included in your execution script file will be echoed to the console and thus to your script log file. Here is an example script file that creates a table (relation) named products:
CREATE TABLE products ( productID INT, name VARCHAR(80), price NUMERIC(10,2), retailPrice NUMERIC(10,2) );Here is an example script file that loads four tuples into the table (relation) named products created using the previous script file:
COPY products FROM stdin USING DELIMITERS '|'; 1419|American Greetings CreataCard Gold V4.0|21.49|25.24 1424|Barbie(R) Nail Designer(TM)|20.74|25.99 1427|Panzer Commander|21.99|30.24 1431|Riven: The Sequel to Myst|31.99|40.24 \.This is the format you will use for the files that load data into your tables. The USING DELIMITERS '|' and the use of '|' as a delimiter is optional. The default delimiter is the tab character. The delimited data on each line must match the attributes and their types in your table in a one to one manner and in the order they were defined in your CREATE TABLE commands. The COPY data must be terminated with '\.'
For testing, some students have found it convenient to have a separate script file to populate each of the tables. I also find it convenient have a single script file to create all of my tables and another one to drop all of my tables. Examples follow.
-- Sample Script file to Create and -- Populate a BEERS DB -- print out the current time SELECT timeofday(); CREATE TABLE Beers ( name VARCHAR(30), manf VARCHAR(50) ); COPY Beers FROM stdin USING DELIMITERS '|'; Coors|Adolph Coors Coors Lite|Adolph Coors Miller|Miller Brewing Miller Lite|Miller Brewing MGD|Miller Brewing Bud|Anheuser-Busch Bud Lite|Anheuser-Busch Michelob|Anheuser-Busch Anchor Steam|Anchor Brewing \. CREATE TABLE Bars ( name VARCHAR(30), addr VARCHAR(50), license VARCHAR(50) ); COPY Bars FROM stdin USING DELIMITERS '|'; Joe's|123 Any Street|B7462A Sue's|456 My Way|C5473S \. CREATE TABLE Sells ( bar VARCHAR(20), beer VARCHAR(30), price REAL ); COPY Sells FROM stdin USING DELIMITERS '|'; Joe's|Coors|2.50 Joe's|Bud|2.50 Joe's|Bud Lite|2.50 Joe's|Michelob|2.50 Joe's|Anchor Steam|3.50 Sue's|Coors|2.00 Sue's|Miller|2.00 \. CREATE TABLE Drinkers ( name VARCHAR(30), addr VARCHAR(50), phone CHAR(16) ); COPY Drinkers FROM stdin USING DELIMITERS '|'; Bill Jones|180 Saint St.|831-459-1812 Kelly Arthur|180 Alto Pl.|650-856-2002 Fred|1234 Fifth St.|831-426-1956 \. CREATE TABLE Likes ( drinker VARCHAR(30), beer VARCHAR(30) ); COPY Likes FROM stdin USING DELIMITERS '|'; Bill Jones|Miller Bill Jones|Michelob Kelly Arthur|Anchor Steam Fred|MGD \. CREATE TABLE Frequents ( drinker VARCHAR(30), bar VARCHAR(30) ); COPY Frequents FROM stdin USING DELIMITERS '|'; Bill Jones|Joe's Bill Jones|Sue's Kelly Arthur|Joe's \. -- Execute some SELECT queries-- SELECT * FROM Bars; SELECT * FROM Drinkers; -- print out the current time SELECT timeofday();
DROP TABLE Beers; DROP TABLE Bars; DROP TABLE Sells; DROP TABLE Likes; DROP TABLE Frequents; DROP TABLE Drinkers;