You are here: Vanderbilt Biostatistics Wiki>Main Web>DatabaseIntro (revision 1)EditAttach

Database Introduction

What is a Database?

  • A database is a collection of records stored in a computer in a systematic way, such that a computer program can consult it to answer questions. For better retrieval and sorting, each record is usually organized as a set of data elements (facts). The items retrieved in answer to queries become information that can be used to make decisions that might otherwise be more difficult or impossible to make.
  • A database contains a schema which is the description the type of facts held in the database. Typically the schema describes the objects that are represented in the database, and the relationships among them.
  • There are a number of different ways a schema can be organized.
    • Hierarchial - records are linked together in tree structure. Each record can have only one parent and many children.
    • Network - like Hierarchial but each record can have multiple parents and child records.
    • Relational - records are spread out amoung multiple tables, relationships between tables are not defined explicitly; instead, keys are used to match up rows of data in different tables. A key is a collection of one or more columns in one table whose values match corresponding columns in other tables and can be used to join data from multiple tables together.
  • Almost all modern databases are relational model databases. (Oracle, MySQL, PostgreSQL, DB2, and SQL Server)
  • To request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL language.

The SQL Language

  • SQL is an acronym for Structured English Query Language.
  • SQL is designed for a specific, limited purpose - querying data contained in a relational database.
  • Although SQL is defined by both ANSI and ISO, there are many extensions to and variations on the version of the language defined by these standards bodies.
    • SQL code can rarely be ported between database systems without major modifications.
    • However Once a user knows the basics of SQL, programing for other databases is easyer then learning a new language.

SQL Keywords

Connection

  • USE is used to select the current database.
    Example:
    USE webcalendar
    

Data Retrieval

  • SELECT is used to retrieve zero or more rows from one or more tables in a database. In specifying a SELECT query, the user specifies a description of the desired result set, but they do not specify what physical operations must be executed to produce that result set.
    • Commonly available keywords related to SELECT include:
      • FROM is used to indicate which tables the data is to be taken from, as well as how the tables join to each other.
      • WHERE is used to identify which rows to be retrieved, or applied to GROUP BY.
      • GROUP BY is used to combine rows with related values into elements of a smaller set of rows.
      • HAVING is used to identify which rows, following a GROUP BY, are to be retrieved.
      • ORDER BY is used to identify which columns are used to sort the resulting data.
Examples:
SELECT webcal_entry.cal_id, webcal_entry.cal_name, webcal_user.cal_firstname, 
  webcal_user.cal_lastname FROM webcal_entry, webcal_user 
  WHERE webcal_entry.cal_create_by = webcal_user.cal_login;

SELECT webcal_entry.cal_id, webcal_entry.cal_name, webcal_user.cal_firstname, 
  webcal_user.cal_lastname FROM webcal_entry, webcal_user 
  WHERE webcal_entry.cal_create_by = webcal_user.cal_login LIMIT 15;

SELECT webcal_entry.cal_id, webcal_entry.cal_name, webcal_user.cal_firstname, 
  webcal_user.cal_lastname FROM webcal_entry, webcal_user 
  WHERE webcal_entry.cal_create_by = webcal_user.cal_login 
  ORDER BY webcal_entry.cal_id LIMIT 15;

SELECT COUNT(webcal_entry.cal_id), webcal_user.cal_firstname, 
  webcal_user.cal_lastname FROM webcal_user, webcal_entry 
  WHERE webcal_entry.cal_create_by = webcal_user.cal_login 
  GROUP BY webcal_entry.cal_create_by;

SELECT COUNT(webcal_entry.cal_id), webcal_user.cal_firstname, 
  webcal_user.cal_lastname FROM webcal_user, webcal_entry 
  WHERE webcal_entry.cal_create_by = webcal_user.cal_login 
  GROUP BY webcal_entry.cal_create_by 
  HAVING COUNT(webcal_entry.cal_id) > 5;

Data Manipulation

  • INSERT is used to add zero or more rows (formally tuples) to an existing table.
    INSERT INTO webcal_entry_user (cal_id, cal_login, cal_status)
      VALUES (225, 'otheruser', 'A');
    
  • UPDATE is used to modify the values of a set of existing table rows.
    UPDATE webcal_entry_user SET cal_status = 'D' 
      WHERE cal_id = 225 AND cal_login = 'otheruser';
    
  • DELETE deletes all data from a table (non-standard, but common SQL command).
    DELETE FROM webcal_entry_user 
      WHERE cal_id = 225 AND cal_login = ;otheruser';
    

Data Definition

  • CREATE causes an object (a table, for example) to be created within the database.
    CREATE TABLE my_table
    (
     my_field1 INT UNSIGNED,
     my_field2 VARCHAR(50),
     my_field3 DATE NOT NULL,
     PRIMARY KEY (my_field1, my_field2)
    )
    
  • DROP causes an existing object within the database to be deleted, usually irretrievably.
    DROP TABLE my_table;
    
Edit | Attach | Print version | History: r2 < r1 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r1 - 26 Oct 2005, CharlesDupont
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback