wiki:soc/2007/cgi

Version 14 (modified by Darren Garvey, 15 years ago) ( diff )

--

About

Below is a partial list of things this project aims to provide:

  • Implement the Controller part of the model-view-controller idiom for any CGI-compatible protocols.
  • Simple access to standard CGI environment variables and input data.
  • Easy access to request meta-data (ie. 'environment vars') and input data using alternative protocols, such as FastCGI.
  • Asynchronous read/write support to encourage multi-threaded *CGI daemons.
  • A uniform way to write to clients without knowledge of the underlying protocol in use.
  • Minimal process initialisation time: most of the time, clients will be handled by multiple process images, so each time one is started up, there can't be a noticeable load-time before handling the first request.
  • Basic session support: by default Boost.Interprocess will be used for saving the memory, although the SessionAdapter concept should allow for other saving methods to be added later (such as files, databases or in-process memory).
  • Internationalisation support should be considered, although to what extent this can be tackled I'm not yet sure.
  • An interface that encourages separation of client-supplied variables (eg. GET vs. POST), but does not enforce it.

Usage

Note: these examples are not yet compilable, but are expected to be reasonably soon.

First, a standard CGI example:

#include <boost/cgi.hpp>

using namespace boost::cgi;

int main()
{
  request req;   // set up the request
  response resp; // make an empty response

  std::string name(req.GET("user_name"));
  resp<< "Hello there, "
      << (name.empty()? "stranger" : name)
      << "!";
  resp.send(req.client());
  
  return 0;
}

Protocols such as SCGI and FastCGI allow a single process to handle multiple requests. The simplest way to turn the above into a FastCGI daemon is to use the synchronous API provided by the library. For example:

#include <boost/cgi/fcgi.hpp> // Include only FastCGI functionality

using namespace boost::fcgi;

int main()
{
  service s; // This manages the request queue, amongst other things. More about this later.
  request req(s); // Constructed with the service from above.
  response resp;
  acceptor a(s); // This is responsible for 'accepting' requests from the server.

  // Initialise anything else you want to keep between 
  
  while(a.accept(req) == 0) // no error here
  {
    std::string name(req.GET("user_name"));
    resp<< "Hello there, "
        << (name.empty()? "stranger" : name)
        << "!";
    resp.send(req.client());
    req.close(http::ok);
    resp.clear();
  }

  return 0;
}

Using a multiplexing protocol (SCGI or FastCGI) in a multi-threaded environment can be much more flexible than the example above allows. In many (if not most) cases you will be able to increase the throughput of your daemon by handling more than one request at a time (ie. within each process). To get the most out of your process you should use the asynchronous functionality of the library - kindly provided by Boost.Asio. Making a fully asynchronous program requires a different approach to synchronous ones and can be quite mind-bending until you are used to it.

You can still benefit from the asynchronous nature of the library without complicating your program by distinguishing between accepting requests and using them. For the example below, we can first create a Server, the purpose of which is to accept requests and pre-load the data from the clients (this is likely sub-optimal, but keeps the demonstration more to the point). The class shown is also not generic - it's only useable with FastCGI programs - but this is just to keep it concise. A generic version is provided (or should be soon) in the distribution.

#include <boost/cgi/fcgi.hpp>
#include <boost/function.hpp> // Really cool library!

using boost::fcgi;

// Define the Server class
class Server
{
public:
  Server()
    : service_()
    , acceptor_(service_)
  {
  }

  void run()
  {
    start_accept();
  }

  void start_accept()
  {
    fcgi::request::pointer new_request(fcgi::request::create(service_));
    acceptor_.async_accept(new_request, boost::bind(&Server::handle_accept, new_request));
  }

  void handle_accept(fcgi::request::pointer req, boost::system::error_code& ec)
  {
    if (!ec) { // no errors, so load the request data
      req->async_load(fcgi::parse_all, boost::bind(&Server::handle_load, req));
    }
    start_accept(); // start another request
  }

  void handle_load(fcgi::request::pointer req, boost::system::error_code& ec)
  {
    if (!ec) { // no errors
      handler_(*req); // Call the user-supplied handler
    } else {
      req.abort(http::bad_request);
    }
  }
private:
  fcgi::service  service_;
  fcgi::acceptor acceptor_;
  boost::function<int (fcgi::request)> handler_; // The user-supplied handler function.   
};

You'll notice that this request runs in an infinite loop and will possibly accept requests faster than you can handle them. We'll deal with that later. For now, we can use the above example class.

#include <boost/cgi/fcgi.hpp>
#include "Server.hpp" // The above example

using namespace boost::fcgi;

// This is where you deal with the request
int sub_main(request& req)
{
  response resp;

  resp<< "Hello, "
      << req.form("user_name") // this accepts either GET or POST variables.
      << "!";
  resp.send(req.client());

  return req.close(http::ok); // we have to explicitly close the requests now.
}

int main()
{
  Server server(&sub_main);
  server.run();
  
  return 0;
}

And that's it!

(below notes are outdated)

Design Notes

Past discussion can be found (starting) here:
http://lists.boost.org/Archives/boost/2007/04/120191.php
http://lists.boost.org/Archives/boost/2007/04/119565.php

See Concepts for more.

Separation of cgi::request and cgi::response:

This separation is only a recent change. The main reasoning is that equivalent meta-data exists for both the request and the response (ie. same identifier, different value). Using getters/setters is one idea, although in a large program, there could be a situation where you set a response header and then need to check it later. If everything was done with the request object then there'd be no clean way to achieve this.

Having two objects has other advantages:

  • Code is clearer, without being too verbose
  • Response caching is easier to implement; code can just cache a cgi::response since it holds no data relevant to the specific request (note: response caching isn't really part of this project, although cgi::session will probably provide basic facilities)

Having the CommonGatewayService control threading

See Dispatching.

Main Classes

cgi::basic_request<>

This holds the data corresponding to the request. It will be specific to a Protocol type and will be aware of how to receive, send and parse data for that Protocol. There will be typedefs for typical usage.

cgi::request

By default, this provides a general (as opposed to generic) access point to any type of request. If constructed with a service object, then the request takes a request from the queue (or blocks until one is available). Default construction initialises a standard cgi environment.

This generality is achieved using runtime linkage in a similar way to boost::any, although static linking can be forced using a choice of macros which turn cgi::request into a typedef for a particular cgi::basic_request<>.

cgi::response

This simply holds headers and the content of the response and provides various ways to write to it. Up until it is sent to the user, it is unaware of what it's a response to. This helps keep both library and user code clean and explicit, without being overly verbose and aids significantly with response caching (something this library won't address for now).

Use of this class is entirely optional.

cgi::session

This will provide simple session data caching.

cgi::basic_protocol_service<>

This is the base class in the library. It is a container for the io_services (from Boost.Asio) that implement asynchronous support and holds lists of current connections and of waiting requests from multiplexed connections.

Important Internal Classes

cgi::basic_client<>

The gateway is the abstraction of the interface with the server. This can vary from just an abstraction of std::cin/cout to a fully multiplexed set of connections, which can themselves be of more than one type.

cgi::basic_request_acceptor<>

Accepts a new, possibly unloaded request. Before using the request cgi::basic_request<>::load() or cgi::basic_request<>::async_load() must be called.

Random Notes

  • The active requests queue should hold boost::weak_ptr<basic_request<> >s. That means that the request will be properly destroyed unless it's on the pending requests queue or actually being handled by user code. Note: to keep it alive during asynchronous requests, the user should be using a function object with a boost::shared_ptr<request_base> that they pass to the Handler given to the async request.
  • The standard CGI sub-library should be header-only. Ideally the other parts (eg. fcgi_service) will be either header-only or compilable, with compilable as the default: a multi-process FastCGI server pool is the most common use, so using a shared '[MVC-]controller' library is likely to be quite effective.
  • Is there a need for a boost::lexical_cast<> wrapper? Something like cgi::param_as<char,cgi::GET>("blah") or cgi::get_as<int,cgi::POST>("user_id"). Consider:
    void a()
    {
      int id = 4096;
      cgi::request req;
      if( boost::lexical_cast<std::string>(req.param<cgi::POST>("user_id")) > (id / 4) &&
          req.param_as<int,cgi::POST>("user_id") != id )
        // ...
    }
    
Note: See TracWiki for help on using the wiki.