Making MySQL WebScale - a Lesson in Database Normalization

MySQL is designed around doing good JOINS. Databases like looking up records from related tables; it makes them happy. As such, its a good idea to store every piece of data exactly once: then there’s only ever one place to store it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
SELECT
    C.id AS customer_id,
    FNAME.value AS first_name,
    LNAME.value AS last_name,
    PHONE.value AS phone_number,
    O.id AS order_id,
    SHIP1.value AS shipping_line_one,
    SHIP2.value AS shipping_line_two,
    STATE.value AS shipping_state,
    CITY.value AS shipping_city
    ZIP.value AS shipping_zip,
    OL.id AS order_line_id
    ODATE.value AS order_date,
    P.id AS product_id,
    PNAME.value AS product_name,
    QTY.value AS quantity,
    PRICE.value AS price
FROM
    customers C
    JOIN strings FNAME ON (C.first_name_string_id = FNAME.id)
    JOIN strings LNAME ON (C.last_name_string_id = LNAME.id)
    JOIN integers PHONE ON (PHONE.id = C.phone_integer_id)
    JOIN orders O ON (O.customer_id = C.id)
    JOIN strings SHIP1 ON (SHIP1.id = O.shipping_line_one_string_id)
    JOIN strings SHIP2 ON (SHIP2.id = O.shipping_line_two_string_id)
    JOIN strings CITY ON (CITY.id = O.shipping_city_integer_id)
    JOIN strings STATE ON (STATE.id = O.shipping_state_integer_id)
    JOIN integers ZIP ON (ZIP.id = O.shipping_zip_integer_id)
    JOIN order_lines OL ON (OL.order_id = O.id)
    JOIN dates ODATE ON (ODATE.id = OL.order_date_date_id)
    JOIN products P ON (OL.product_id = P.id)
    JOIN strings PNAME ON (PNAME.id = P.name_string_id)
    JOIN integers QTY ON (QTY.id = OL.order_quanitity_integer_id)
    JOIN floats PRICE ON (PRICE.id = OL.price_float_id)

This is really fast because EVERY SINGLE JOIN is a Primary Key look up. WebScale aficionados know that PK searches are the fastest look up available to MySQL: its only limited by the speed of light inside the CPU cache-bus-lines.

Creating a new order is Super EASY! Let me show you:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
// Use $_POST instead of $_GET ($_POST cant be faked
// with the URL bar in the browser
$fname_id = query("SELECT id FROM strings WHERE value='$_POST[fname]'");
if (!$fname_id)
    $fname_id = execute_with_last_insert_id(
  "INSERT INTO strings SET value=$_POST[fname]");
}

$lname_id = query("SELECT id FROM strings WHERE value='$_POST[lname]'");
if (!$lname_id)
    $lname_id = execute_with_last_insert_id(
  "INSERT INTO strings SET value=$_POST[lname]");
}

$phone_id = query("SELECT id FROM integers WHERE value='$_POST[phone]'");
if (!$phone_id)
    $phone_id = execute_with_last_insert_id(
  "INSERT INTO integers SET value=$_POST[phone]");
}
$shipping_line_one_string_id = query("SELECT id FROM strings WHERE value='$_POST[shipping_line_1]'");
if (!$shipping_line_one_string_id)
    $shipping_line_one_string_id = execute_with_last_insert_id(
  "INSERT INTO strings SET value=$_POST[shipping_line_1]");
}

$shipping_line_two_string_id = query("SELECT id FROM strings WHERE value='$_POST[shipping_line_2]'");
if (!$shipping_line_two_string_id)
    $shipping_line_two_string_id = execute_with_last_insert_id(
  "INSERT INTO strings SET value=$_POST[shipping_line_2]");
}
$shipping_state_string_id = query("SELECT id FROM strings WHERE value='$_POST[shipping_state]'");
if (!$shipping_state_string_id)
    $shipping_line_state_string_id = execute_with_last_insert_id(
  "INSERT INTO strings SET value=$_POST[shipping_state]");
}
$shipping_line_city_string_id = query("SELECT id FROM strings WHERE value='$_POST[shipping_city]'");
if (!$shipping_city_string_id)
    $shipping_city_string_id = execute_with_last_insert_id(
  "INSERT INTO strings SET value=$_POST[shipping_city]");
}

$shipping_zip_integer_id = query("SELECT id FROM integers WHERE value='$_POST[shipping_zip]'");
if (!$shipping_zip_integer_id)
    $shipping_zip_integer_id = execute_with_last_insert_id(
  "INSERT INTO integers SET value=$_POST[shipping_zip]");
}
$order_date_date_id = query("SELECT id FROM dates WHERE value='$_POST[order_date]'");
if (!$order_date_date_id)
    $order_date_date_id = execute_with_last_insert_id(
  "INSERT INTO dates SET value=$_POST[order_date]");
}

$quantity_integer_id = query("SELECT id FROM integers WHERE value='$_POST[qty]'");
if (!$quantity_integer_id)
    $quantity_integer_id = execute_with_last_insert_id(
  "INSERT INTO integers SET value=$_POST[qty]");
}

$price_float_id = query("SELECT id FROM floats WHERE value='$_POST[price]'");
if (!$price_float_id)
    $price_float_id = execute_with_last_insert_id(
  "INSERT INTO floats SET value=$_POST[price]");
}
// finally,
$customer_id = execute_with_last_insert_id(
    "INSERT INTO customers SET
        fname_string_id = $fname_id,
        lname_string_id = $lname_id,
        phone_integer_id = $phone_integer_id");

$order_id = execute_with_last_insert_id(
    "INSERT INTO orders SET
        customer_id = $customer_id,
        shipping_line_one_string_id = $shipping_one_string_id,
        shipping_line_two_string_id = $shipping_two_string_id,
        city_string_id = $shipping_city_string_id,
        state_string_id = $shipping_state_string_id,
        shipping_zip_integer_id = $shipping_zip_integer_id");

execute_with_last_insert_id(
    "INSERT INTO order_lines SET
        order_id = $order_id,
        product_id = $product_id,
        quantity_integer_id = $quantity_integer_id
        price_float_id = $price_float_id");

Whoever said PHP and MySQL couldn’t scale????

Comments