Thursday, August 21, 2014

Http, Https, Ajp which one to choose.

http, https and ajp - comparison and choice

In a web scenario, client to server traffic is usually carried using an http (HyperText Transfer Protocol) transport. That's both from browser to public facing server, but also in ongoing transfers from the public facing server to other servers which provide content or run business logic in many applications.

But you'll note that I said "usually" - there are other transports that are available and used. The first group are those which transport the same data as http - specifically https and ajp. let's start off with describing what's in http.

What is http?

An http request comprises a series of lines of data, each new line terminated. The first of these lines comprises the request method (such as GET or POST) followed by the name of the resource required (such as /index.html) followed by a protocol version (such as HTTP/1.1). Subsequent lines include such things as the name of the host being contacted, referrer headers, cookies, the type of the browser, preferred language, and a whole host more details. In HTTP/1.1 only the name of the host being contacted is required in subsequent lines - the rest are conditional or optional. In the case of the POST method, the header is followed by the data that's associated with the request. An http requested is followed by a blank line which indicated that it is complete.

A server processes an http request and sends out a response. The response comprises a header block, a blank line, and (in most cases) a data block. The first line of the header includes a response code which indicates the success or otherwise of the request - a 3 digit number in the following ranges:
200 and up - success; expect good data to follow
300 and up - good request but only headers (no data). e.g. page has moved
400 and up - error in request. e.g. request was for missing page (404)
500 and up - error in handling request. e.g. program on server has syntax error

This line of the header block is followed by other headers telling the receiving system the content type (Mime type) which allows that receiving system to know whether to handle it as HTML, and a JPEG image, etc. Then there's a blank line and the actual data.

As there are often multiple requests made from the same client to the same server in quick succession (for example a web page will call up images), the connection often stays alive for a few seconds under HTTP/1.1.

See http protocol specification for further details

So what is https?

The https protocol carries the same information as http, but adds to it a secure socket layer (SSL). In other words, the data is encrypted at the client and decrypted at the server, and then the same happens in reverse. The purpose of this encryption is to ensure that stray data packets that are viewed along the way are no use the person who has them - they're uninterpretable binary data.

The https scheme is quite complicated - it starts off with the client having to establish that it's really talking to the correct server (and not some other machine pretending to be the correct server!) and then goes on to agree with that server just how things will be uniquely encoded. The same keys can't be used for multiple connections between different systems, or individual security would be compromised.

See https protocol - detailed description

How about AJP then? How does that compare to HTTP?

The http protocol is quite expensive in terms of band width - it's an ascii text protocl with words like "POST" and phrases like "Content-type:" taking up more bandwidth than is really needed, and having to be interpreted at destination too. So the ajp protocol (Apache Java Protocol?) was established to allow for much less expensive exchanges between upstream and downstream servers that are to be closely linked.

ajp carries the same information as http but in a binary format. The request method - GET or POST - is reduced to a single byte, and each of the additional headers are reduced to 2 bytes - typically, that's about a fifth of the size of the http packet.

See ajp protocol specification for further internal details.

Should I use http, https or ajp?

For most browser to server traffic, use http. If there's a need for security in the data (or if you're in doubt / customers may question the security), use https.

Between servers, http actually works very well - if you have an Apache httpd fronting a number of other servers (be they Apache http or Apache Tomcat), then there's nothing wrong with using the protocol at that layer too. Httpd's mod_proxy and mod_rewrite both allow for forwarding, and server languages such as PHP and Perl can make outgoing requests from the top tier server to other servers using http.

If you're looking to share the load between a number of second level (application) servers from a top level httpd server, mod_proxy_balancer introduced in Apache httpd 2.2 provides you with the tools that you'll need, and mod_rewrite can also do a good load distributions job (although the distribution algorithm is simple). For programs running on the server, outgoing requests can be distributed programatically.

One of the big issues of forwarding to a series of machines to balance the load is making sure that a series of linked pages and data entries called up by the same user are properly co-ordinated ("session continuity" it is called) and both mod_proxy_balancer and mod_rewrite provide the facility to support this. In the case of mod_proxy_balancer, it's a core feature. With mod_rewrite, a clever configuration.

If you have intensive / busy servers with bandwidth issues between them, use ajp as your linking protocol. The now-excellent mod_jk (available for you to build from the Jakarta project in Apache httpd 2.0 and prior, standard with the httpd distribution from Apache 2.2) provided an excellent use of the protocol, and support in Tomcat is strong. Many commercial systems are using ajp as their transport, and some recent benchmarks I did showed it to be 25% faster that httpd. You, should, though, remember that the transport is only a tiny part of most applications and so the savings are likely to be minimal on a real live system.

See protocol documents if you want to read further into this.

This is quite a long story, isn't it? If you're setting up multiple servers and sharing resources, you may want to learn the deployment and configuration details. We run several courses that may help you, where you get a chance to set up and try out the various options - see Deploying Apache httpd and Tomcat if you're linking the two servers, or Linux / Unix Web Server if you're configuring / linking multiple copies of httpd. We can also arrange specific private courses for groups, and / or short consultancy sessions. Contact me - graham@wellho.net to talk about your particular needs.

Other Protocols

To help complete the picture - protocols such as ftp and rmi transport different types of content, and xml, soap and the like are different layers. Again - I can cover that for you if needed!

No comments:

Post a Comment