CentOS and old versions of PHP

When you ask for someone to set you up with an install of the latest CentOS, you’re still going to be stuck with an old version of PHP (version 5.1.6) which is missing many, many useful features, and includes a few bugs. Then salt is applied to the wound when you try to update PHP in yum and discover that 5.1.6 is the latest in their repository.

Here is a post that explains the easy way to update to PHP 5.2.x, without having to build from source.

http://www.freshblurbs.com/install-php-5-2-centos-5-2-using-yum

It relies on this third party Yum repository that contains a more up-to-date version of PHP and its extensions.
http://www.jasonlitka.com/yum-repository/

Getting Serious about Data Redundancy

Reluctantly, I have had to assume the role of “server guy” with my translation company. I generally prefer to focus on the creative side of web application development, but I’m not naive enough to think that server backups and security can be completely ignored… so it falls to me to make sure that we are prepared for a catastrophe or any kind. This weekend I spent some time reviewing our current situation and implementing improvements.

In reviewing our backup strategy we must first consider what type of catastrophes we want to be prepared for. Some possible problems we might face include:

1. The site could be hacked and data corrupted or deleted.
2. We could experience hardware failure (e.g. hard drive could die or server could konk out).
3. We could face a major regional distaster like the earthquake/tsunami that hit Japan recently.

We also need to consider our tolerance to down time and frequency of database changes. E.g. some simple backup strategies might involve an off-site copy of the files and database off-site so that you can retrieve them if necessary. But this strategy, for larger setups may take upwards of 24 hours to get back only in the case of a failure (re-uploading the data, setting up the server configuration, etc..). If you’re working in an environment where even a few minutes of down-time is a problem, then you would need to develop a strategy that will allow you to be back online much faster.

Similarly, if you are only backing up once every 24 hours, you could potentially be losing 24 hours worth of user updates if you had to revert to a backup.

In our case we are running a 2-tier backup strategy:

1. Hot-backup: This is a backup that is always sychronized to the live site so that it can be brought online with the flip of a switch.
2. Archived Backup: The focus of this back-up is to be able to withstand a regional catastrophe, or be able to revert to a previous version of the data in case of corruption that has affected the hot-backup.

Hot Backup Strategy

For the hot backup we are using MySQL replication to run a slave server that is always in sync with the master. This is useful for 2 reasons:

1. If there is a failure on the master’s hard drive, then we can switch over to the slave without any down-time or loss of data.
2. If we need to produce a snapshot of the data (and shut down the server temporarily) it is easier to work off of this slave so that the live site never needs to be turned off for backup maintenance.

Archived Backup Strategy

We are running our more critical sites on Amazon’s Elastic Computing Cloud service (EC2) because of its ease of scalability and redundancy. We are using the Elastic Block Storage (EBS) for the file systems which store both the application files and the database data. This makes it easy for us to take snapshots of the drives at any point in time. For our database backups, we periodically take a snapshot of the EBS volume containing our database data. (First we set a read lock on the database and records the master status so we know exactly which point in the binary log this snapshot refers to). If there is a failure at any point in time, we just load in the most recent snapshot, then rebuild the data incrementally using the binary log.

The snapshot feature of Amazon for EBS is a real life saver. Actually copying the data when you’re talking hundreds of gigabytes is quite time consuming. With EBS, however is only takes a minute or two as it uses a clever incremental scheme for producing the snapshot. This is one of the key reasons why I’m opting to use EC2 for our critical sites.

Just in Case

Amazon claims to have redundant backups of all of our snapshots and distributed across different data centres…. but just in, case, I like to have a local backup available for our purposes. So I use rsync to perform a daily backup to a local hard drive. Hopefully I never need to use this backup, but it is there just in case.

We can always get better…

This backup strategy helps me sleep at night but there are still somethings about it that could be improved. Our database backups are now pretty rock solid – as we could recover from nearly any failure without experiencing any data loss using a combination of a snapshot and the binary log. However for the file system we don’t have the equivalent of a binary log to be able to rebuild the build system from the most recent snapshot. I know that this can be achieved, and more seasoned “server people” probably think this is a no-brainer, but I’m a software guy, not a server guy so go easy ….

Heritage Pianos Chinese and Korean Sites Live

Thanks to the hard work of our team, we’ve completed the translation of Heritage Pianos’ website into both Chinese and Korean.

These sites are powered by a new proxy version of SWeTE that combines SWeTE’s simple translation workflow and search engine optimization capabilities. Because these sites are powered by SWeTE, it will be quite easy to keep the websites in sync as inventory is added to the English site.

Some Background on the Project

The Heritage Pianos website is powered by Joomla! which allows its owners to quite easily make modifications to the content of the site. They wanted, however, to be able to offer English and Chinese translations of their web site as well. Joomla! does have internationalization modules available to help convert an existing site into multiple languages, but this wouldn’t solve the more difficult problem of keeping the translations in sync and managing a workflow with professional translators.

These are the things that SWeTE excels at. To get the project kicked off, we just added a tiny, invisible HTML snippet to the header of the site’s template that allows it to hook into the SWeTE translation workflow. We then let SWeTE parse the pages into translatable chunks that we could bundle into jobs for our Chinese and Korean translators. Meanwhile we set up a a proxy site that would be able to display the translated versions of the site once the translations were complete.

When everything was completed, we simply modified the snippet in the Heritage Pianos Joomla! template so that the language options appeared for the user to switch between languages.

Keeping the Translations in Sync

When changes are made to the English website, we can perform a scan with SWeTE to detect these changes. These can then be bundled into a job for the translators, translated, and ultimately approved by the site owner so that the translated versions will by synchronized once again.

Adventures with KeepAlive Slowing down PHP

Disclaimer: To my regular readers (if there are any) this post is entirely technical in nature so if you’re not looking for the solution to this particular problem (e.g. from Google) you’re probably not interested in this post.

I run a few sites that use PHP as a reverse proxy for other sites. The PHP script loads a specific webpage, then sub-requests for resources like stylesheets, images, and javascripts, are also processed by the same PHP script. For most of these other resources the script simply passes the content through unchanged.

Problem:

When a page is loaded that includes a large amount of supporting resources like CSS and javascript files, it seems to take a long time to load some of the CSS and javascript files. Some of the scripts were taking upwards of 15 seconds to load when loaded as part of the page (although if loaded on their own through the PHP script, they load instantly).

Short Term Solution

The investigation of the problem determined that the problem lies in the KeepAlive setting in Apache. Turning Keep Alive off fixes the issue and yields excellent load times consistently for all resources. But WHY??

Long Term Solution

Keep Alive is generally a good thing and it is supposed to boost performance. So why is it a bad thing in this context? (I really don’t know the answer yet). I would like to find a PHP solution to this problem that will work with KeepAlive, but I can only speculate at this point as to why this might be happening.

Possible reasons that I’ve considered:

  1. I need to do something more than “exit” at the end of the PHP script to declare that I’m done with the request – as Apache could be keeping the connection open for longer than necessary and preventing subsequent requests from happening.
  2. Perhaps there is some setting that I am unaware of in PHP that limits the number of simultaneous requests it can handle from the same client… (I find this unlikely – at least at the level that we’re talking about here).

Edit:

I found that simply adding the ‘Connection:close’ header in the PHP script resolves this issue.
But i’m still not satisfied. This shouldn’t be necessary IMHO. If you set the Connection-Length properly then flush that content to the browser, there’s no reason why PHP should be making the browser wait around indefinitely.

So maybe we’re missing a few performance points here – but at least it’s not hanging like it was.

How I use the iPad

It’s been close to a year since I purchased my iPad, so I thought I’d post a short piece reflecting upon how I have ended up using it.

Things I tried that didn’t really work:

1. Word processing. – The keyboard is OK for typing short little things but it is just too cumbersome to try to write anything substantial. I think that even if I had the bluetooth keyboard it still wouldn’t be a good solution for word processing. Selecting text, copying and pasting, and even just trying to insert text at a different position in the document is just too difficult at this time.

2. Note taking – I tried taking the iPad to a few meetings for the purpose of note taking. It was a novelty but ultimately it is just easier to take notes on paper and then transcribe them on the laptop later.

Things I tried that really work:

  1. Reading the news paper. I use the Pressreader app which gives me access to 1700 newspapers from around the world for $30 per month. I generally read through the Vancouver Sun, Province, and 24 H, and the Washington Post every morning before I go to work.
  2. Reading books. (on the Kindle app – not iBooks)
  3. Watching Movies in bed. I use Web Lite TV to be able to stream my entire iTunes movie collection to my iPad. This effectively turns the iPad into my bedroom TV. I also use the Netflix app which works quite well when I want to watch a movie or TV show that I don’t currently have in my personal collection.
  4. Facebook in bed. Rather than use a Facebook App, I just added the Facebook site to my home screen (so it gets an icon and all and acts like an app). I much prefer checking facebook on my iPad to logging in on my computer.
  5. Reading Email in bed – I check my email from bed on the iPad. This allows me to make a mental inventory of things that I need to reply to. Short replies I will make on the iPad directly, but generally I’ll go to a computer to write more detailed replies.
  6. Browsing the web in bed. It’s just easier to use an iPad than a laptop in bed. And it’s pretty damn easy to browse the web on the iPad.

The iPad has secured a permanent place in my life – though it hasn’t threatened to replace the laptop in the foreseeable future. Of course, this is what Apple intended when they designed it.