2015-01-19

http协议

Quick reminder about HTTP

When haproxy is running in HTTP mode, both the request and the response are
fully analyzed and indexed, thus it becomes possible to build matching criteria
on almost anything found in the contents.

当haproxy运行在HTTP mode的时候，它会把所有的响应以及请求都会分析和建索引，这样才能够让匹配内容中的关键信息成为可能

However, it is important to understand how HTTP requests and responses are
formed, and how HAProxy decomposes them. It will then become easier to write
correct rules and to debug existing configurations.

然后，我们需要要理解http的请求和响应的格式记忆haproxy是如何解析他们的，这样我们才能更加正确和简便的写出正确的规则以及调试现有的配置文件

## 1.1. The HTTP transaction model

The HTTP protocol is transaction-driven. This means that each request will lead
to one and only one response. Traditionally, a TCP connection is established
from the client to the server, a request is sent by the client on the
connection, the server responds and the connection is closed. A new request
will involve a new connection :

  [CON1] [REQ1] … [RESP1] [CLO1] [CON2] [REQ2] … [RESP2] [CLO2] …

In this mode, called the “HTTP close” mode, there are as many connection
establishments as there are HTTP transactions. Since the connection is closed
by the server after the response, the client does not need to know the content
length.

http协议时事物驱动的，这意味这每一个请求都会导致有且仅有一个响应。一般来说，一个TCP链接是客户端和服务器之间建立的双向链接，一个客户端的请求在这个链路上被发送，服务器会返回一个响应，然后关闭这个链接，一个请求就会有一个新的链接，就如下一样

  [CON1] [REQ1] … [RESP1] [CLO1] [CON2] [REQ2] … [RESP2] [CLO2] …

这种模式叫做“HTTP close”模式，当HTTP事物开始的时候会建立很多次的链接。当服务器返回请求后链接被其关闭客户端并不需要知道内容的长度。

Due to the transactional nature of the protocol, it was possible to improve it
to avoid closing a connection between two subsequent transactions. In this mode
however, it is mandatory that the server indicates the content length for each
response so that the client does not wait indefinitely. For this, a special
header is used: “Content-length”. This mode is called the “keep-alive” mode :

  [CON] [REQ1] … [RESP1] [REQ2] … [RESP2] [CLO] …

根据这种协议的特性，我们可以在两次连续的事物之间避免关闭tcp链接来提升性能。在这种模式下，必须让服务器强制性的设定每个响应的内容长度，这样才能让客户端无限制的等待。为了达到这个目标，一个特殊的头将会被用到“Content-length”。这个模式叫做“keep-alive”模式

  [CON] [REQ1] … [RESP1] [REQ2] … [RESP2] [CLO] …

Its advantages are a reduced latency between transactions, and less processing
power required on the server side. It is generally better than the close mode,
but not always because the clients often limit their concurrent connections to
a smaller value.

keep-alive模式效的减少了两个事物之间的延迟，且可以让服务器提升性能，一般情况下这种模式比close模式好很多，但也不是总是，因为客户端经常会限制并发连接数在一个很小的值上。

A last improvement in the communications is the pipelining mode. It still uses
keep-alive, but the client does not wait for the first response to send the
second request. This is useful for fetching large number of images composing a
page :

  [CON] [REQ1] [REQ2] … [RESP1] [RESP2] [CLO] …

为了解决这个问题，我们可以使用一种叫做pipelining的模式，它使用了keep-alive，客户端在发送第二个请求的时候就不需要等待第一个请求的响应了。这个在其请求页面上有大量的图片的时候很有用。

This can obviously have a tremendous benefit on performance because the network
latency is eliminated between subsequent requests. Many HTTP agents do not
correctly support pipelining since there is no way to associate a response with
the corresponding request in HTTP. For this reason, it is mandatory for the
server to reply in the exact same order as the requests were received.
这明显在性能上有明显的提升，因为连个请求之间的网络延迟被消灭了。许多http代理不能正确的支持pipelining，因为他们没有办法把响应和正确的请求联系起来。因为这个原因，服务器必须按照请求的顺序进行响应。

By default HAProxy operates in keep-alive mode with regards to persistent
connections: for each connection it processes each request and response, and
leaves the connection idle on both sides between the end of a response and the
start of a new request.

haproxy默认使用的时keep-alive的模式：对于每一个链接，它接受一个请求就处理一个请求，链接会在结束响应之后和新开始一个请求之间的时间内处于空闲。

HAProxy supports 5 connection modes :
  - keep alive    : all requests and responses are processed (default)
  - tunnel        : only the first request and response are processed,
                    everything else is forwarded with no analysis.
  - passive close : tunnel with “Connection: close” added in both directions.
  - server close  : the server-facing connection is closed after the response.
  - forced close  : the connection is actively closed after end of response.

haproxy支持5种链接模式

-keep alive:所已有的请求和响应都会被处理

-tunnel     :只有第一个请求和响应会被处理，其余的直接转发，不会被分析

-passive close：tunnel with “connection：close” added in both directions

-server close：the server-facing connection is closed after the response.

-forced close：在响应结束后链接会被关闭

## 1.2. HTTP request

First, let’s consider this HTTP request :

  Line     Contents
  number
     1     GET /serv/login.php?lang=en&profile=2 HTTP/1.1
     2     Host: www.mydomain.com
     3     User-agent: my small browser
     4     Accept: image/jpeg, image/gif
     5     Accept: image/png

首先，我们来考虑下HTTP请求：

  Line     Contents
  number
     1     GET /serv/login.php?lang=en&profile=2 HTTP/1.1
     2     Host: www.mydomain.com
     3     User-agent: my small browser
     4     Accept: image/jpeg, image/gif
     5     Accept: image/png

### 1.2.1. The Request line

Line 1 is the “request line”. It is always composed of 3 fields :
第一行时请求行，一般由三个部分组成
  - a METHOD      : GET
  - a URI         : /serv/login.php?lang=en&profile=2
  - a version tag : HTTP/1.1

All of them are delimited by what the standard calls LWS (linear white spaces),
which are commonly spaces, but can also be tabs or line feeds/carriage returns
followed by spaces/tabs. The method itself cannot contain any colon (‘:’) and
is limited to alphabetic letters. All those various combinations make it
desirable that HAProxy performs the splitting itself rather than leaving it to
the user to write a complex or inaccurate regular expression.
这些都被叫做LWS（Linear white spaces，其实一般就是空格）的标准分割开。

方法：不能有任何的冒号，它被限制在字母表中的字符。

所有的这些不同组合在一起，可以让haproxy分析它，而不是让用户写复杂的正则表达式。

The URI itself can have several forms :

  - A “relative URI” :

      /serv/login.php?lang=en&profile=2

    It is a complete URL without the host part. This is generally what is
    received by servers, reverse proxies and transparent proxies.

  - An “absolute URI”, also called a “URL” :

      http://192.168.0.12:8080/serv/login.php?lang=en&profile=2

    It is composed of a “scheme” (the protocol name followed by ‘://‘), a host
    name or address, optionally a colon (‘:’) followed by a port number, then
    a relative URI beginning at the first slash (‘/‘) after the address part.
    This is generally what proxies receive, but a server supporting HTTP/1.1
    must accept this form too.
URI：有两种格式

第一种时相对路径的URI，如下

/serv/login.php?lang=en&profile=2

这种是不需要域名部分的，这一般都是服务器/反向代理/透明代理接受到的样子

还有一种时绝对路径叫做URL，如下

http://xxx.com/serv/login.php?lang=en&profile=2

这被一个scheme压缩（协议名字在://前面）

  - a star (‘*’) : this form is only accepted in association with the OPTIONS
    method and is not relayable. It is used to inquiry a next hop’s
    capabilities.

  - an address:port combination : 192.168.0.12:80
    This is used with the CONNECT method, which is used to establish TCP
    tunnels through HTTP proxies, generally for HTTPS, but sometimes for
    other protocols too.

In a relative URI, two sub-parts are identified. The part before the question
mark is called the “path“. It is typically the relative path to static objects
on the server. The part after the question mark is called the “query string”.
It is mostly used with GET requests sent to dynamic scripts and is very
specific to the language, framework or application in use.

### 1.2.2. The request headers

The headers start at the second line. They are composed of a name at the
beginning of the line, immediately followed by a colon (‘:’). Traditionally,
an LWS is added after the colon but that’s not required. Then come the values.
Multiple identical headers may be folded into one single line, delimiting the
values with commas, provided that their order is respected. This is commonly
encountered in the “Cookie:” field. A header may span over multiple lines if
the subsequent lines begin with an LWS. In the example in 1.2, lines 4 and 5
define a total of 3 values for the “Accept:” header.

请求头从第二行开始，都是由一个语法时name: value1,value2

Contrary to a common mis-conception, header names are not case-sensitive, and
their values are not either if they refer to other header names (such as the
“Connection:” header).

http头的名字都不是大小写敏感的，当它的值引用其他的头的名字时，也不大小写敏感

The end of the headers is indicated by the first empty line. People often say
that it’s a double line feed, which is not exact, even if a double line feed
is one valid form of empty line.
头的结束时有一个空的行来表示

Fortunately, HAProxy takes care of all these complex combinations when indexing
headers, checking values and counting them, so there is no reason to worry
about the way they could be written, but it is important not to accuse an
application of being buggy if it does unusual, valid things.
幸运的时，haprxy对这些复杂的组合都会认证的建索引，检查值和对他们进行计数。
Important note:
   As suggested by RFC2616, HAProxy normalizes headers by replacing line breaks
   in the middle of headers by LWS in order to join multi-line headers. This
   is necessary for proper analysis and helps less capable HTTP parsers to work
   correctly and not to be fooled by such complex constructs.

## 1.3. HTTP response

An HTTP response looks very much like an HTTP request. Both are called HTTP
messages. Let’s consider this HTTP response :
一个http响应比较像http的请求，都叫做http消息
  Line     Contents
  number
     1     HTTP/1.1 200 OK
     2     Content-length: 350
     3     Content-Type: text/html

As a special case, HTTP supports so called “Informational responses” as status
codes 1xx. These messages are special in that they don’t convey any part of the
response, they’re just used as sort of a signaling message to ask a client to
continue to post its request for instance. In the case of a status 100 response
the requested information will be carried by the next non-100 response message
following the informational one. This implies that multiple responses may be
sent to a single request, and that this only works when keep-alive is enabled
(1xx messages are HTTP/1.1 only). HAProxy handles these messages and is able to
correctly forward and skip them, and only process the next non-100 response. As
such, these messages are neither logged nor transformed, unless explicitly
state otherwise. Status 101 messages indicate that the protocol is changing
over the same connection and that haproxy must switch to tunnel mode, just as
if a CONNECT had occurred. Then the Upgrade header would contain additional
information about the type of protocol the connection is switching to.

### 1.3.1. The Response line

Line 1 is the “response line”. It is always composed of 3 fields :

  - a version tag : HTTP/1.1
  - a status code : 200
  - a reason      : OK

The status code is always 3-digit. The first digit indicates a general status :
 - 1xx = informational message to be skipped (eg: 100, 101)
 - 2xx = OK, content is following   (eg: 200, 206)
 - 3xx = OK, no content following   (eg: 302, 304)
 - 4xx = error caused by the client (eg: 401, 403, 404)
 - 5xx = error caused by the server (eg: 500, 502, 503)

Please refer to RFC2616 for the detailed meaning of all such codes. The
“reason” field is just a hint, but is not parsed by clients. Anything can be
found there, but it’s a common practice to respect the well-established
messages. It can be composed of one or multiple words, such as “OK”, “Found”,
or “Authentication Required”.

Haproxy may emit the following status codes by itself :

  Code  When / reason
   200  access to stats page, and when replying to monitoring requests
   301  when performing a redirection, depending on the configured code
   302  when performing a redirection, depending on the configured code
   303  when performing a redirection, depending on the configured code
   307  when performing a redirection, depending on the configured code
   308  when performing a redirection, depending on the configured code
   400  for an invalid or too large request
   401  when an authentication is required to perform the action (when
        accessing the stats page)
   403  when a request is forbidden by a “block“ ACL or “reqdeny“ filter
   408  when the request timeout strikes before the request is complete
   500  when haproxy encounters an unrecoverable internal error, such as a
        memory allocation failure, which should never happen
   502  when the server returns an empty, invalid or incomplete response, or
        when an “rspdeny“ filter blocks the response.
   503  when no server was available to handle the request, or in response to
        monitoring requests which match the “monitor fail“ condition
   504  when the response timeout strikes before the server responds

The error 4xx and 5xx codes above may be customized (see “errorloc“ in section
4.2).

### 1.3.2. The response headers

Response headers work exactly like request headers, and as such, HAProxy uses
the same parsing function for both. Please refer to paragraph 1.2.2 for more
details.