Automating Spreadsheets with PHP / MySQL

January 21, 2019

Rahul Wahi

Spreadsheets are a great tool. Whether you use Excel, or OpenOffice, or any other offerings, you have the ability to manipulate (relatively) large amounts of data. You can do quick calculations, do vlookups, and create pivot tables with little effort.

However your sheets have gotten larger and larger. You have data stored on multiple tabs and are using calculations which jump back and forth. Problems start to occur. Your data integrity starts to become unreliable. Maybe you’ve shared copies with others, and suddenly your data is inconsistent between one data source and another; nobody knows which is the most recent version, and worse…. they all are and none of them are. On top of this, all it takes is one bad sort, or any sort of corruption and you’ve lost all of your data.

You are starting to mimic functionality that is really better suited to a database. Don’t get me wrong, I love spreadsheets and use them a lot myself. However it is important to understand their uses and what they are good for. As spreadsheets are highly unstable, they are not good at storing large amounts of data.

While on the surface, they may seem more complicated than spreadsheets, databases have many advantages. They are much better at handling complex data, they are far more stable, and they allow you to store all of your information in one place.

Databases may take a little time to get used to and to migrate your data, however, once you get used to them, they are not as complex as you might think, and the payoff is huge. Also, for quick data manipulation, you can always simply export it into a CSV file and do what you wish with the spreadsheet of your choice.

We will cover the following steps:

  • Preparing your data for migration
  • building the database
  • moving your data from the spreadsheet into the database
  • Using PHP to pull the data from the database back into a CSV format

Getting Started

To start, if you have small *AMP stack already installed (such as XAMPP), you have all of the pieces necessary: you have PHP and you have MySQL, and likely a nice graphical tool such as phpMyAdmin or MySQL Workbench. I’m also going to make the assumption that you have some basic understanding of PHP, such as how to work with arrays, loops and the like.

Organizing your data

For the purpose of this article, I’m going to use the example of a small car rental business. Let’s assume you’ve been keeping your records on a spreadsheet, possibly all on one sheet, or spread across several.

Let’s also assume that you started with just one sheet in your workbook. For each car rental you have columns including the rental date, the make, model, and year, the plate number, the price per day, the customer, and the number of the days rented. You may have a new row for each item.

Before long you start seeing a lot of items get repeated. The same car is rented by many people, sometimes the same person. One person rents different cars.

You discover that one simple sheet is clearly not enough so of course you started splitting pieces off into different tabs. Cars are now on a separate tab, so do customers, and rental transactions. You’ve been able to identify which car has been rented at which time by a unique identifier, likely to be something like the plate number.

So you keep transactions on a separate tab, and you keep your customers on a separate one as well. It is getting more and more complicated as you realize that you need to keep a record of more and more data. At this point, it makes sense to migrate existing data over to a database.

Normalization

If you have been sorting data off into different tabs, you probably already have the basic idea. With normalization, you only include information when it is needed, and stored in only one place. If you notice that there is a small bit of information that keeps getting used over and over again, you can split that off. With a spreadsheet, however, it’s very easy to accidentally have typos resulting in some items being listed two or more times when they are in fact are the same thing.

With the normalization process, you simply reduce the amount of data necessary to its basic units. Part of the magic of a databases is that instead of relying on a complex id (such as a license plate for an individual vehicle) you can simply assign a numerical value, which is the “primary” or identifying key of the item. This makes it easy to reference this vehicle in any place else it is used, such as the rental transactions.

Let’s break this down:

After listing the unique items that are necessary for a individual entity, we identify places where certain items are duplicated. We then identify individual items (much like the objects that you are used to in PHP) and in any place we see a reference, we create a reference, or “foreign key” to the original item.

Let’s start with an item that will live alone by itself, which we will use as a lookup to maintain some level of “authority control” (a method to make sure that every reference to an item is always referencing the correct item).  

In this case we have the make and model of a car. Here is how this would look

  • model: id, make, model

Next we have the car.  As we know that a car will only have one model, we will use “model_id” as a reference to the id on the model sheet:

  • car: id, model_id, year, rate

Our customer is unique, so the sheet would look something like this:

  • customer: id, last_name, first_name, address, phone, etc.

Finally we have the rental sheet which puts together the other information. Note that each rental references a car, and a customer. We can also have a column for cost, which can be calculated off of other information (days rented times the cost on the car sheet)

  • rental: id, car_id, customer id, date rented, days, total cost (calculated)

We now have 4 separate spreadsheets or tabs.

If you are a spreadsheet expert, you likely have make and model stored on a separate sheet and are bringing over the data by a Vlookup. You can do this even easier (and more stably) in your database. The “cost” of a rental is a simple formula created by multiplying days times rate.  

Cars have one model, and a model can exist for many cars.  Rentals involve only one car, and car can be rented many times.  Similarly, a rental involves one customer, but a customer can rent many different cars.

At this point we are almost there. These will become your tables in your database. Here’s a simple visualization:

Automating Spreadsheets with PHP / MySQL 1

 

Setting up your MySQL database

Following the guidelines above, you simply need to create tables for each entity (Note: if you use a tool like MySQL Workbench, you can actually generate the SQL dump file which will create it for you).

You will want to name your database something like “rentals” or something else that makes sense to you. To create your database, you can either do it in whichever graphical tool you are using (such as phpMyAdmin), or you can do it manually, like so:

CREATE DATABASE rentals;

Your columns from your spreadsheet become “keys” on your database. To build your tables, You will want to pay attention to the datatypes listed above. Each table has a “primary key” which is the unique identifier we created for each table with “id.” Primary keys are almost always INT, and text fields are given a VARCHAR datatype. Here’s a simple table creation for the model table. (Please note again, this is not necessary if you are using a graphical tool, but you may find it cleaner to do it with the command line, or in the SQL window of your tool).

CREATE TABLE IF NOT EXISTS `rentals`.`model` (
`id` INT NOT NULL, AUTO_INCREMENT,
`make` VARCHAR(45) NULL,
`model` VARCHAR(45) NULL,
PRIMARY KEY (`id`))
ENGINE = InnoDB;

 

The car table would look like this:

CREATE TABLE IF NOT EXISTS `mydb`.`car` (
`id` INT NOT NULL AUTO_INCREMENT,
`model_id` INT NOT NULL,
`year` INT NULL,
`rate` DECIMAL(4) NULL,
PRIMARY KEY (`id`))
ENGINE = InnoDB;

Creating your input queries

The syntax for inserting data into a table in MySQL is

insert into tablename (key1, key2) values (value1,value2);

Of course, you likely have a lot of data already, so you’re probably thinking, “I don’t want to have to reenter everything!” That of course would be a pain Well that’s the beauty of combining functionality from your spreadsheets to enter data into MySQL. One of the great advantages of spreadsheets is that using simple spreadsheet functions, you can just write one row, and dynamically create the rest.

For example, entering the data for your model table is as easy as this (note, that VARCHAR values must be enclosed in single quotes):

You can simply drag column D down for all of your rows, and you end up with something like this:

Automating Spreadsheets with PHP / MySQL 2

For the car table it would be similar, just like this. Note that model_id matches up with id from the model table:

Automating Spreadsheets with PHP / MySQL 3

Just copy and paste everything from column D into your SQL window and voila, you have your database populated.

Automating Spreadsheets with PHP / MySQL 4

To get a listing of all cars, including the make and model, you can run this query

SELECT c.id, c.year, m.make, m.model, c.rate
FROM car c
JOIN model m
ON c.model_id = m.id

Creating your PHP Application layer

Now you have your database which you can query as you like. However, you will likely want to have some sort of graphic interface. This is where your knowledge of PHP comes in. You can create an interface (however you like) to enable you to view, manage, and enter data through a web interface.

To connect to your database, create a PDO connection, with a method like this


class Database

   private $host = 'localhost';
   private $dbname = 'rentals';
   private $username = 'database username';
   private $password = 'database password';
   public $conn;

public function getConnection() 
{
   $this->conn = null;
   
   try{
      $this->conn = new PDO("mysql:host=".$this->host.";dbname=".$this->dbname, 
       $this->username, $this->password);
      
       }catch(PDOException $exception){
           echo "Connection error: ".$exception->getMessage();
       }
       return $this->conn;
}

 

You can access your data in the spreadsheet with a method like this:

class Cars {

function carOutput()
   {
      $query = "SELECT c.id, c.year, m.make, m.model, c.rate

             FROM car c

             JOIN model m

             ON c.model_id = m.id
              order by make";

     $stmt = $this->conn->prepare( $query );
     $stmt->execute();    
     return $stmt;
   }

}

 

You can use this method to call any of the data from your new database, and it will generate your results as an any other array, and you can display it on a webpage, or however you like.

Of course, spreadsheets are still great for reporting purposes, or to do some last minute manipulation. It is relatively easy to get your data back out again into a format that your spreadsheet software will understand. To do this, the following script will generate a CSV file which can load into the spreadsheet of your choice.

header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename=persons.csv');

$database = new Database();
$db = $database->getConnection();
$cars = new Cars($db);
$stmt = $cars->carOutput()
$csv = fopen('php://output', 'w');
fputcsv($csv, array('id', 'year', 'make', 'model', 'rate'));

while ($row = $stmt->fetch(PDO::FETCH_ASSOC))
{
   fputcsv($csv, $row);
}
fclose($csv);  


Summary

You now have a basic understanding of how to convert your spreadsheets into a format that will work for a database, creating a database, loading your data, and getting it back out again.

This is, of course, just a taste.  To learn more about manipulating databases through a PHP interface, keep checking this space for future articles!

 

312.487.2482
Start a Project