Data bases

In this lecture we’ll use database to learn and practice how to work with data. Below, you’ll find a description of each data base and some instructions how to access it.

Example data bases used in this lecture

Chinook

Description

The Chinook database represents a digital media store, including tables for artists, albums, media tracks, invoices, and customers. With a focus on the media store scenario, it provides a variety of real-world data and scenarios, such as managing the inventory, dealing with customers, and processing orders. The database is designed to cover various aspects and relationships that are common in business processes, providing a comprehensive platform to practice and analyze data. Besides the primary entities, it captures data on employees, genres, playlists, and media types, showcasing a wide spectrum of business operations in a simplified yet insightful manner.

Tables

  • album: Each album belonging to an artist.
  • artist: Artist names.
  • customer: Customer name, address, and contact details.
  • employee: Employee details, reporting structure.
  • genre: Music genre.
  • invoice: Invoice header details linked to customers.
  • invoiceline: Detail lines for each invoice linked to tracks.
  • mediatype: Different media formats.
  • playlist: Playlist names.
  • playlisttrack: Associations between playlists and tracks.
  • track: Individual track details including name, composer, and media type.

Data model

Figure 1: Chinook data model

DVD Rental

Description

The DVD rental sample database simulates the business operations of a DVD rental store. It comprises 15 tables, 1 trigger, 7 views, 8 functions, 1 domain, and 13 sequences. The database includes tables for actors, films, categories, stores, inventories, rentals, payments, staff, customers, and geographical data. Through these tables, it captures relationships such as those between films and actors or films and categories.

Tables

  • actor: actors data including first name and last name
  • film: film data such as title, release year, length, rating, etc.
  • film_actor: the relationships between films and actors
  • category: film’s categories data
  • film_category: the relationships between films and categories
  • store: store data including manager staff and address
  • inventory: inventory data
  • rental: rental data
  • payment: customer’s payments
  • staff: staff data
  • customer: customer data
  • address: address data for staff and customers
  • city: city names
  • country: country names

Data model

Figure 2: DVD Rental data model

Employees

Description

The Employees Sample Database is designed to represent a company’s organizational structure and employee records. It includes various tables that store data on employees, their titles, salaries, departments, and managerial hierarchy. This database can be utilized to practice SQL queries, explore organizational data management, and understand relationships between different entities within a corporate environment.

Tables

  • employees: Individual employee records including names, hire date, and gender.
  • dept_emp: Associations between employees and departments.
  • departments: Department details including names and department numbers.
  • dept_manager: Associations between departments and their managers.
  • titles: Employee titles and the duration they held those titles.
  • salaries: Employee salary information along with the duration they earned those salaries.

Data model

Figure 3: Employee data base model

Gravity Books

Description

The Gravity Bookstore Database encapsulates a bookstore’s operational data, encompassing books, authors, languages, publishers, customers, orders, and shipping details, providing a well-rounded view of the business processes involved in a bookstore scenario.

Tables

  • book: All books available in the store.
  • book_author: Authors for each book.
  • author: All authors.
  • book_language: Possible languages of books.
  • publisher: Publishers for books.
  • customer: Customers of Gravity Bookstore.
  • customer_address: Addresses for customers.
  • address_status: Statuses for an address.
  • address: Addresses in the system.
  • country: Countries that addresses are in.
  • cust_order: Orders placed by customers.
  • order_line: Books part of each order.
  • shipping_method: Possible shipping methods for an order.
  • order_history: History of an order.
  • order_status: Possible statuses of an order.

Data model

Figure 4: Gravity Bookstore data base model

Superheroes

Description

The Superheroes Database catalogs superhero characters, their powers, cities they protect, and enemies they face, facilitating an imaginative exploration of data relationships within a fantastical context.

Tables

  • superhero: Superhero characters and their details.
  • power: Different powers superheroes possess.
  • city: Cities superheroes protect.
  • enemy: Enemies superheroes face.
  • superhero_power: Associates superheroes with their powers.
  • enemy_superhero: Associates enemies with superheroes.

Data model

Figure 5: Superhero data base model

Olympics

Description

The Olympic Games Database captures data regarding Olympic games, events, athletes, and countries, enabling analysis and insights into the performance and participation across different Olympic games.

Tables

  • game: Details of each Olympic game.
  • event: Events within each game.
  • athlete: Athlete information.
  • country: Country information.
  • result: Results of events.
  • event_type: Types of events.

Data model

Figure 6: Olympic Games data base model

Video games

Description

The Video Games Database holds information about video games, platforms they’re available on, and their developers, facilitating a structured overview of various video game titles and their attributes.

Tables

  • game: Details about each video game.
  • platform: Platforms on which games are available.
  • developer: Developers of the games.
  • game_platform: Associates games with platforms.
  • game_genre: Associates games with genres.
  • genre: Different genres of video games.

Data model

Figure 7: Video Games data base model

Airport

Description

The airport database contains information form an import.

Tables

  • passenger: Passenger data
  • booking: Flight booking by passenger
  • booking_leg: Legs of a booking
  • boarding pass: Information of boarding pass for booking
  • airport: Airport information
  • flight: The flight
  • aircraft: Aircraft used for a flight
  • account: Customer account of a passenger
  • frequent_flyer: Information on frequent flyers
  • phone: Phone numbers

Data model

Figure 8: Airport data base model

New York City Flights 2013

Description

The nycflights13 database contains all flights departing from the three airports in New York in 2013.

Tables

  • flights: Flights information
  • airports: Airport data
  • planes: Planes used for the flights
  • airlines: Airline names
  • weather: Weather data at the three New York City airports

Data model

Figure 9: New York City Flights 2014 data base model

IMDb

Description

The IMDb is a moderately large, real database of movies.

Tables

  • directors: Table of movie directors
  • movies: Movies
  • actors: Actors
  • roles: Actors’ roles in a movie
  • directors_genres: Relation between directors and genres
  • movies_genres: Genre of each movie
  • movies_directors:Director of each movie

Data model

Figure 10: IMDb film database model

Airbnb

Description

The airbnb database contains real data from all Airbnb listings in Berlin on 16 September, 2023.

Tables

  • listings: Table of all listed accomodations
  • calendar: Calendar of bookings per listing
  • hosts: Host details
  • reviews: Reviews per listing
  • reviewers: Reviewers

Data model

Accessing the data

Using Python in JupyterHub

From the FH JupyerHub, all databases and tables are easily accessible using a few pre-defined Python functions. You can get all tables from a database using the function get_all_tables and submitting the database’s name (e.g. "chinook"):

# Extracting all tables from the database into an object
chinook = get_all_tables("chinook")

The method list_tables() will return a list of all tables that were extracted:

# Listing all tables
chinook.list_tables()
['albums',
 'artists',
 'customers',
 'employees',
 'categories',
 'film_actor',
 'actors',
 'film_category',
 'films',
 'genres',
 'invoice_lines',
 'invoices',
 'media_types',
 'playlist_track',
 'playlists',
 'tracks']

To see and use the content of a table from the database, just access it using the notation databaseobject.tablename:

# Accessing a table
chinook.actors
id first_name last_name last_update
0 1 Penelope Guiness 2013-05-26 14:47:57.620
1 2 Nick Wahlberg 2013-05-26 14:47:57.620
2 3 Ed Chase 2013-05-26 14:47:57.620
3 4 Jennifer Davis 2013-05-26 14:47:57.620
4 5 Johnny Lollobrigida 2013-05-26 14:47:57.620
... ... ... ... ...
198 199 Julia Fawcett 2013-05-26 14:47:57.620
199 200 Thora Temple 2013-05-26 14:47:57.620
200 201 Rob Conery 2014-02-25 09:32:52.174
201 202 Joe Bucket 2014-02-25 09:41:36.420
202 203 Bip Donk 2014-02-25 09:41:45.037

203 rows × 4 columns

Using Tableau

You can access the database directly from Tableau via the PostgreSQL Connector. You will need the following details for the authentication:

  • Username: digibim
  • Password: Di&i$IM0815
  • Host: wi-sql.fh-muenster.de
  • Port: 5432

Please note that the databases are only accessible within the FH Münster network, i.e. either using eduroam or via VPN.

Figure 11: Tableau Posgres Connector
Back to top