wiki:soc/2007/cgi

Version 7 (modified by Darren Garvey, 15 years ago) ( diff )

--

About

Below is a partial list of things this project aims to provide:

  • Implement the Controller part of the model-view-controller idiom
  • Simple access to standard CGI environment variables and input data.
  • Clean access to request meta-data (ie. 'environment vars') and input data using alternative protocols, such as FastCGI.
  • Asynchronous read/write support.
  • A clean way to write to clients without knowledge of the underlying protocol in use.
  • Minimal process initialisation time: most of the time, clients will be handled by multiple process images, so each time one is started up, there can't be a noticeable load-time before handling the first request.
  • Basic session support: by default Boost.Interprocess will be used for saving the memory, although the SessionAdapter concept should allow for other saving methods to be added later (such as files, databases or in-process memory).
  • Internationalisation support should be considered, although to what extent this can be tackled I'm not yet sure.

Usage

First, a standard CGI example:

int main()
{
  cgi::request req;   // set up the request
  cgi::response resp; // see Design Ideas for more about this

  resp<< "Hello, " << req.param<cgi::GET>("user_name") << "!";
  resp.send(req);
  
  return 0;
}

Using an alternate protocol (FastCGI in this case) will alter the above like so:

int sub_main(cgi::request req)
{
  cgi::response resp; // see Design Notes for more about this

  resp<< "Hello, " << req.param<cgi::POST>("user_name") << "!";
  resp.send(req);

  return 0;
}

int main()
{
  cgi::fcgi_service service(&sub_main);
  service.run();
  
  return 0;
}

Design Notes

Past discussion can be found (starting) here:
http://lists.boost.org/Archives/boost/2007/04/120191.php
http://lists.boost.org/Archives/boost/2007/04/119565.php

See Concepts for more.

Separation of cgi::request and cgi::response:

This separation is only a recent change. The main reasoning is that meta-data exists for both the request and the response. Using getters/setters is one idea, although in a large program, there could be a situation where you set a response header and then need to check it later. If everything was done with the request object then there'd be no way to achieve this.

Having two objects has other advantages:

  • Code is clearer, without being too verbose
  • Response caching is easier to implement; code can just cache a cgi::response since it holds no data relevant to the specific request (note: response caching isn't really part of this project, although cgi::session will probably provide basic facilities)

Having the CommonGatewayService control threading

  • In order for the program initialisation time (ie. the time before the first request begins to be handled) to be kept to a minimum, pre-emptive multithreading is not an option.
  • Users shouldn't have to implement complex event handling mechanisms in order to write a responsive, multiplexed application.
  • Since CGI request handlers can take a long time to complete, having i/o and request handling done in the same threads can cause starvation of i/o. Giving a guarantee that handlers will only be called in threads calling basic_service<>::run() might cause this and other problems.
    Options:
    1. There should be threads calling basic_service<>::run() which handle input and output, and a separate set of threads which run the request handler provided by the user.
    2. Only threads calling basic_service<>::run() should handle requests, but the service should be able to increase the number of threads calling run().
    3. If a user really wants all to be handled in the same threads, then passing a boost::thread_group to the service's constructor could provide a compromise: the user uses the thread_group to call basic_service<>::run() and the service uses it to dispatch request handlers.
  • As a consequence of the above points, the number of running threads should be variable. Without this would lead to unresponsive programs if the user wasn't using a Proactive (ie. asynchronous) model or didn't have a good way of monitoring the service and adapting the number of threads.
  • When threading support isn't available, a FastCGI application (for instance) should still 'just work', with request handlers running concurrently instead of in parallel.

Thread-pool vs. thread-per-request

Note: Since each connection can be multiplexing, it doesn't make sense to allow a thread-per-connection policy as this would make request response times inconsistent.

In general a thread pool will be more efficient than having a thread per request, especially if the reply is no more than a true/false statement (eg. in the Authorizer FastCGI role). A thread-per-request option should exist since thread local storage would be compromised using a thread pooling strategy, making an application less secure.

Single-threaded application

In the case of threading support not being available, all services should still work. Also, the style of the first example (above) should map to a FastCGI application by simply creating an fcgi_service first and then passing that to the cgi::request's constructor.

Main Classes

cgi::basic_request<>

This holds the data corresponding to the request. It will be specific to a Protocol type and will be aware of how to receive, send and parse data for that Protocol. There will be typedefs for typical usage.

cgi::request

By default, this provides a general (as opposed to generic) access point to any type of request. If constructed with a service object, then the request takes a request from the queue (or blocks until one is available). Default construction initialises a standard cgi environment.

This generality is achieved using runtime linkage in a similar way to boost::any, although static linking can be forced using a choice of macros which turn cgi::request into a typedef for a particular cgi::basic_request<>.

cgi::response

This simply holds headers and the content of the response and provides various ways to write to it. Up until it is sent to the user, it is unaware of what it's a response to. This helps keep code - both library and user code - clean and explicit, without being overly verbose and aids significantly with response caching (something this library won't address for now).

cgi::session

This will provide simple session data caching.

cgi::basic_service<>

This is the main class in the library. There should be specializations for each Protocol and the underlying structure should be generic enough to allow for any type of cgi-like protocol to be 'serviced', without sacrificing efficiency, clarity of code or any of the aims stated in the Design Notes.

Important Internal Classes

cgi::gateway<>

The gateway is the abstraction of the interface with the server. This can vary from just an abstraction of std::cin/cout to a fully multiplexed set of connections, which can themselves be of more than one type.

cgi::acceptor<>

Accepts connections. This should probably accept on only one connection type (meaning the gateway would be responsible for retrying an accept on a different connection type if that's allowed by the current protocol.

cgi::dispatcher<>

Completion handlers should be sent to a dispatcher to be called since a variety of dispatching methods should be available. The most basic/general is a thread-per-request dispatcher; more efficient would be a thread-pool dispatcher. Users should be able to implement their own dispatcher if they wish without having to know about any other class: encapsulation is important. Note: asio already has a dispatcher, so what can be used from that?

Random Notes

  • The active requests queue should hold boost::weak_ptr<basic_request<> >s. That means that the request will be properly destroyed unless it's on the pending requests queue or actually being handled by user code. Note: to keep it alive during asynchronous requests, the user should be using a function object with a boost::shared_ptr<request_base> that they pass to the Handler given to the async request.
  • The standard CGI sub-library should be header-only. Ideally the other parts (eg. fcgi_service) will be either header-only or compilable, with compilable as the default: a multi-process FastCGI server pool is the most common use, so using a shared '[MVC-]controller' library is likely to be quite effective.
Note: See TracWiki for help on using the wiki.