We just deployed Nginx as a forward proxy for our production server. Before deploying, we took some page load benchmarks so that we could get an idea of what we improved with this change. The benchmarking was a simple python script that opened 100 concurrent threads (3 times to spot outliers) to a list of URLs in different categories:

  • Dynamic Code – mainly PHP scripts, but essentially any application
  • Static Files – styles, pictures, audio, etc.
  • Dynamic Cache – files that are constantly updated, but require no server processing (only reduced caching in Nginx)

The results of these benchmarks were quite astounding; leaving me wondering why it has taken so long for us to make this change. Below are my results, the left side of each picture is Apache and the right is Nginx:

dynamic_code.png
static_files.png
dynamic_cache.png

Below is the script that I used to perform this benchmark (written in Python):

#!/usr/bin/pypy
##
#    Web Benchmarks
#    
#    @author     David Lasley <dave@dlasley.net>
#    @package    toolbox
#    @version    $Id: web_benchmarks.py 1183 2013-08-31 01:40:23Z dlasley $

import time
from multiprocessing import Pool
from wins.toolbox.html_tools import html_tools
from wins.toolbox.pass_extract import pass_extract
from wins.core.excel import excel

SITE_PREFIX = 'https://domain.com/' #< Change domain

# Add sites relative to SITE_PREFIX
CHECK_SITES = {
    'Dynamic Code':[
        // URLS generated by code
    ],
    'Dynamic Cache':[
        // URLs that are statically served; reduced nginx cache
    ],
    'Static':[
        // URLs that are statically served
    ],
}

CONCURRENT_CONNECTIONS = 100 #< Amount of concurrent threads to open
REPEAT_TIMES = 3 #< Amount of times to repeat test (per site)
WAIT_TIME = 60 #< Time to wait after each site (otherwise load average with just Apache goes through the roof)

benchmarks = {}

# Get credentials from encrypted DB
creds = pass_extract('dlasley','AD')
opener = html_tools.url_retriever_auth(SITE_PREFIX , creds.username, creds.current_pass)
#opener = html_tools.url_retriever()

def url_open(check_site, ):
    ''' Simple URL open timer
        @param str check_site URL to check
        @return float Time taken for operation '''
    start = time.time()
    response = html_tools.url_open(opener, check_site)#.read()
    if response.code != 200:
        print '%s failed with code %s' % (check_site, response.code)
        return 'F:%s' % response.code
    else:
        return time.time()-start

# Gen the threading pool and loop all sites by category
pool = Pool(CONCURRENT_CONNECTIONS)
for sys_type, check_sites in CHECK_SITES.iteritems():
    benchmarks[sys_type] = {}
    for check_site in check_sites:
        print 'Benching %s' % check_site
        check_site = '%s%s' % (SITE_PREFIX, check_site)
        benchmarks[sys_type][check_site] = []
        for i in xrange(0, REPEAT_TIMES):
            results = pool.map(url_open, [check_site]*CONCURRENT_CONNECTIONS)
            benchmarks[sys_type][check_site] += results
        print 'Waiting %d' % WAIT_TIME
        time.sleep(WAIT_TIME)
        
# Close the pool to avoid mem leaks
pool.close()

# Create new Excel workbook and dump data
excel = excel(sheet_title='Benchmarks')
for sys_type, benchmark_data in benchmarks.iteritems():
    excel.new_sheet(sys_type)
    write_data = [ ['Site', ] + range(0, CONCURRENT_CONNECTIONS+REPEAT_TIMES) ]
    for sys_name, benches in benchmark_data.iteritems():
        write_data.append([sys_name.replace(SITE_PREFIX, '')] + benches)
    excel.write_data(write_data)

excel.save('/tmp/benchmarks.xlsx')
0