Data Science Tutorial
- What is Data Science?
- Life Cycle of Data Analytics
- What is Machine Learning?
- Python Tools in Data Science
- Working with DataBase
- Data Science using R
- Hierarchical Indexing
- Data Science Using Scikit
- Clustering in Data Science
- Working with Network Data
- What is Plotting
- String Manipulation
- What is Text Analysis?
Working with DataBase
A relational database is a collection of permanently stored and possibly sorted and indexed tables. Relational databases are excellent for storing tabular data (such as data found in CSV files), where one table represents a variable type, the columns of the table represent variables, and the rows represent observations or records.
You don’t need Python to operate a database; however, you must know Structured Query Language (SQL) or its specific implementation, such as MySQL, to access a relational database either from its command line or from a Python application. To interact with a running MySQL database server from the command line, you need a MySQL client, such as mysql.
All MySQL commands are case-insensitive and must end with a semicolon. To start a new database project, use mysql as the database administrator (these operations are executed only once):
Start mysql on the shell command line:
c:\myProject> mysql -u root -p
Enter password: Welcome to the MySQL monitor.
Commands end with ; or \g.
«More mysql output»
Enter all further instructions at the mysql command-line prompt.
- Create a new database user (“dsuser”) and password (“badpassw0rd”):
- Create a new database for the project (“dsdb”):
- Grant the new user access to the new database:
CREATE USER ‘dsuser’@’localhost’ IDENTIFIED BY ‘badpassw0rd’;
CREATE DATABASE dsdb;
GRANT ALL ON dsdb.* TO ‘dsuser’@’localhost’;
Now, it’s time to generate a new table in an existing database. Use the same mysql client, but log in as a regular database user:
c:\myProject> mysql -u dsuser -p dsdb
Welcome to the MySQL monitor. Commands end with ; or \g.
«More mysql output»
Typically a table is created once and accessed many times. You can later change the table’s properties to accommodate your project needs. The command CREATE TABLE, followed by the new table name and a list of columns, creates a new table. For each column, define its name and data type (in this order). The most common MySQL data types are TINYINT, SMALLINT, INT, FLOAT, DOUBLE, CHAR, VARCHAR, TINYTEXT, TEXT, DATE, TIME, DATETIME, and TIMESTAMP.
The following command creates the table employee with the columns empname (text of variable length), salary (a floating-point number), and hired (date). Each record in the table describes one employee.
CREATE TABLE employee (empname TINYTEXT, salary FLOAT, hired DATE);
Query OK, 0 rows affected (0.17 sec)
When you don’t need a table anymore, you can drop it from the database.
DROP TABLE employee;
Query OK, 0 rows affected (0.05 sec)
The DROP command is short, elegant, and bitterly irreversible, like spilling your milk. Think twice before you drop anything!
MySQL supports five basic database operations: insertion, deletion, mutation, selection, and join. They are used to populate database tables and modify and retrieve the existing data. These operations would typically originate in your data analysis program, but to get the sense of them, we will first practice them at the mysql command-line prompt.
We’ll insert a new record into a table, and then another one, and another one, until the table has all observations.
INSERT INTO employee VALUES (NULL,NULL,"John Smith",35000,NOW());
The first two NULLs are placeholder values of the index and the timestamp. The server calculates them automatically. The function NOW() returns the current date and time, but only the “date” part is used to populate the record. Note that the query produced a warning, and the reason for it is the latter truncation.
Deletion removes from the table all the records that match the search criterion. If you don’t specify the search criterion, the server will remove all records:
Remove John Smith if he is low-paid
DELETE FROM employee
WHERE salary < 10000 AND empname="John Smith";
DELETE FROM employee;
If you want to delete only a precise record, use its unique primary key or any other unique identifying condition:
DELETE FROM employee WHERE id=387513;
Mutation updates the values of specified columns in the records that match the search criterion. If you don’t specify the search criterion, the operation will affect all records:
UPDATE employee SET salary=35000 WHERE hired=CURDATE();
// Increase John Smith's salary again
UPDATE employee SET salary=salary+1000 WHERE empname="John Smith";
Selection selects all requested columns from all records that match the search criterion. If you don’t specify the search criterion, you’ll get all records, which may be way more than you want:
SELECT empname, salary FROM employee WHERE empname="John Smith";