C language write CGI program

I. Overview of CGI CGI (Common Gateway Interface) specifies how the web server calls other executable programs (CGI programs) Interface protocol standard. The Web server realizes the interaction with the Web browser by calling the CGI program, that is, the CGI program accepts the information sent by the Web browser to the Web server, performs processing, and sends the response results back to the Web server and the Web browser. CGI programs generally complete the processing of form (Form) data in Web pages, database query, and integration with traditional application systems. CGI programs can be written in any programming language, such as Shell scripting language, Perl, Fortran, Pascal, C language and so on. However, CGI programs written in C language have the characteristics of fast execution speed and high security (because C language programs are compiled and executed and cannot be modified). The CGI interface standard includes three parts: standard input, environment variables, and standard output. 1. Standard input CGI programs, like other executable programs, can get input information from the Web server through standard input (stdin), such as the data in the Form. This is the so-called POST method for transferring data to CGI programs. This means that the CGI program can be executed in the command line state of the operating system to debug the CGI program. The POST method is a commonly used method. This article will take this method as an example to analyze the methods, processes and techniques of CGI program design. 2. Environment variables The operating system provides many environment variables, which define the execution environment of the program, and applications can access them. The Web server and CGI interface also set up some of their own environment variables to pass some important parameters to the CGI program. The CGI GET method also passes the data in the Form to the CGI program through the environment variable QUERY-STRING. 3. Standard output The CGI program transmits the output information to the Web server through the standard output (stdout). The information sent to the Web server can be in various formats, usually in the form of plain text or HTML text, so that we can debug CGI programs on the command line and get their output. The following is a simple CGI program, which directly outputs the form information in HTML to the Web browser. Quote #include #include main() {int,i,n; printf (〃Contenttype:text/plainnn〃); n=0; if(getenv(〃CONTENT-LENGTH〃) ) N=atoi(getenv(CONTENT-LENGTH〃)); for (i=0;i putchar(getchar()); putchar (′n′); fflush(stdout);} Below is a brief summary of this procedure Analysis. prinft (〃Contenttype:text/plainnn〃); This line sends the string “Contenttype:text/plainnn” to the Web server through standard output. It is a MIME header, which tells the Web server that the subsequent output is in pure ASCII The form of the text. Please note that there are two newline characters in this header, because the web server needs to see a blank line before the actual text information starts. If (getenv(〃CONTENT-LENGTH〃)) n= atoi (getenv(〃CONTENT-LENGTH〃)); This line first checks whether the environment variable CONTENT-LENGTH exists. The web server sets this environment variable when calling the CGI program using the POST method, and its text value indicates that the web server transmits to CGI The number of characters in the input of the program, so we use the function atoi() to convert the value of this environment variable into an integer and assign it to the variable n. Please note that the web server does not terminate its output with an end-of-file character, so if you don’t Check the environment variable CONTENT-LENGTH, the CGI program cannot know when the input is over. for (i=0;i putchar(getchar()); This line loops from 0 to (CONTENT-LENGTH-1) times to enter the standard input Every character read is directly copied to the standard output, that is, all input is sent back to the Web server in ASCII form. Through this example, we can summarize the general working process of the CGI program as follows. 1. Pass the inspection The environment variable CONTENT-LENGTH determines how many inputs there are; 2. Recycle getchar() or other file reading functions to get all the input; 3. Process the input in the corresponding method; 4. Use the “Contenttype:” header information to output the information Tell the web server in the format; 5. By using printf() or putc har() or other file writing function, and send the output to the web server. In short, the main task of a CGI program is to get input information from the Web server, process it, and then send the output result back to the Web server. 2. Environment Variables Environment variables are text strings (name/value pairs), which can be set by OS Shell or other programs, and can also be accessed by other programs. They are simple means for the Web server to transfer data to CGI programs. They are called environment variables because they are global variables and any program can access them. The following are some environment variables that are often used in CGI program design. HTTP-REFERER: The URL of the web page that calls the CGI program. REMOTE-HOST: The machine name and domain name of the Web browser that invoked the CGI program. REQUEST- METHOD: Refers to the method used when the Web server transmits data to the CGI program. It is divided into two methods: GET and POST. The GET method only passes data to the CGI program through environment variables (such as QUERY-STRING), while the POST method passes data to the CGI program through environment variables and standard input, so the POST method can easily pass more data to the CGI program. SCRIPT-NAME: The name of the CGI program. QUERY-STRING: When using the POST method, the data in the Form is finally placed in the QUERY-STRING and passed to the CGI program. CONTENT-TYPE: The MIME type of the data passed to the CGI program, usually “applica tion/x-www-form-url encodede”, which is the data encoding type that passes data from the HTML Form to the CGI program in the POST method, called URL encoding type. CONTENT-LENGTH: The number of data characters (bytes) passed to the CGI program. In C language programs, to access environment variables, you can use the getenv() library function. For example: if (getenv (〃CONTENT-LENGTH〃)) n=atoi(getenv (〃CONTENT-LENGTH〃)); Please note that it is best to call getenv() twice in the program: check whether the environment variable exists for the first time, Use the environment variable for the second time. This is because the function getenv() returns a NULL pointer when the given environment variable name does not exist. If you directly refer to it without checking first, the CGI program will crash when the environment variable does not exist. 3. Analysis and decoding of From input 1. Analysis of name/value pairs When a user submits an HTML Form, the web browser first encodes the data in the form in the form of name/value pairs, and sends it to the web server, and then The web server passes to the CGI program. The format is as follows: name1=value1&name2=value2&name3=value3&name4=value4&… The name is the tag name defined in the Form such as INPUT, SELECT or TEXTAREA, and the value is the tag value entered or selected by the user. This format is URL encoding, which needs to be analyzed and decoded in the program. To analyze this data flow, the CGI program must first decompose the data flow into a set of name/value pairs. This can be done by looking for the following two characters in the input stream. Whenever the character = is found, it marks the end of a Form variable name; whenever the character & is found, it marks the end of a Form variable value. Please note that the value of the last variable of the input data does not end with &. Once the name/value pair is decomposed, some special characters in the input must be converted into corresponding ASCII characters. These special characters are: +: convert + into a space character; %xx: special characters represented by their hexadecimal ASCII code value. According to the value xx, it is converted into the corresponding ASCII character. This conversion must be performed on the Form variable name and variable value. Below is a CGI program that analyzes the Form data and sends the results back to the Web server. Quoting #include #include #include int htoi(char *); main() {int i,n; char c; printf (〃Contenttype: text/plainnn〃 ); n=0; if (getenv(〃CONTENT-LENGTH〃)) n=atoi(getenv(〃CONTENT-LENGTH〃)); for (i=0; i #include main() {printf(〃Contenttype:text/html/n/n〃); printf(〃< html >/n〃); printf(〃< head >< title >An HTML Page From a CGI< /title >< /head >/n〃); printf(〃< body> n〃); printf(〃< h2> This is an HTML page generated from with ina CGI program .. .< /h2 >/n〃); printf(〃< hr >< p >/n〃); printf(〃< a href="../output.html#two" >< b> Go back to out put.html page < /a >/n〃); printf(〃< /body >/n〃); printf(〃< /html >/n〃); fflush(stdout);} The above CGI programs simply use the printf() function to generate HTML source code. Please note that if there is a double quotation mark in the output string, there must be a back slash before it. This is because the entire HTML code string is already in double quotation marks, so the double quotation mark in the HTML code string must use a back slash Characters to escape. In HTML, when the client fills in the form and presses the submit button, the content of the form is sent to the server. Generally, a server-side script is needed to process the content of the form. , Or save them, or perform some inquiries by content, or something else. Without CGI, the world of WEB completely loses its interactivity, and all information becomes one-way without any feedback. Some people think that Java script can be used to replace CGI programs. This is actually a conceptual error. Java script can only be run in the client’s browser, while CGI works on the server. There are some overlaps in the work they do, such as form data validation, but Java script can never replace CGI. But it can be said that if a job can be done with both javascript and CGI, then javascript must be used. Javascript has inherent advantages over CGI in terms of execution speed. Only those problems that cannot be solved on the client side, such as interacting with a remote database, should use CGI at this time. Simply put, CGI is an interface used to communicate between HTML forms and server-side programs. To say that it is an interface means that CGI is not a language, but a set of specifications that can be applied by other languages. In theory, you can use any programming language to write CGI programs, as long as you conform to some things defined by the CGI specification when programming. Because the C language performs well in terms of platform independence (almost any system platform has its corresponding compiler), and it is considered very familiar to most programmers (unlike Perl), therefore, C is CGI One of the preferred languages ​​for programming. Here we introduce how to use C to write CGI programs. The simplest example of CGI programming is to process forms. Therefore, in this article, we mainly introduce how to use C to write CGI programs for table but processing. GET form processing For those forms that use the attribute “METHOD=GET” (or without the METHOD attribute, GET is the default value at this time), CGI is defined as: when the form is sent to the server, the data in the form is saved In an environment variable called QUERY_STRING on the server. The processing of this form is relatively simple, as long as the environment variable is read. This has different practices for different languages. In C language, you can use the library function getenv (defined in the standard library function stdlib) to access the value of the environment variable as a string. You can use some tricks to perform type conversion after obtaining the data in the string, which is relatively simple. The standard output in CGI programs (such as the stdout file stream in C) has also been redefined. It does not generate any output on the server, but is redirected to the client browser. In this way, if an HTML document is output to its stdout when writing a C CGI program, the HTML document will be displayed in the client’s browser. This is also a basic principle of CGI programs. Let’s take a look at the specific program implementation. The following is an HTML form:

Please fill in the multiplier and the multiplicand below, and click OK. See the result. < BR>

What we want to achieve The function is very simple, that is, multiply the value entered in the form, and then output the result. In fact, this function can be implemented with java script, but in order to make the program as simple and easy to understand as possible, I still chose this small multiplication as an example. The following is the CGI program that processes this form, which corresponds to the value of the ACTION attribute in the form tag. Quote #include #include int main(void) {char *data; long m,n; printf(“%s%c%c “,”Content-Type:text/html; charset=gb2312”,13,10); printf(“< TITLE >Multiplication result< /TITLE> “); printf(“< H3 >Multiplication result< /H3> “); data = getenv(“QUERY_STRING”); if (data == NULL) printf(“< P >Error! Data is not entered or there is a problem with data transmission”); else if(sscanf(data,”m=%ld&n=%ld”,&m,&n)!=2 ) Printf(“< P >Error! The input data is illegal. The input in the form must be a number.”); else printf(“< P >The results of %ld and %ld are: %ld.”,m,n,m *n); return 0;} The specific C syntax will not be discussed much, let’s take a look at its special place as a CGI program. As mentioned earlier, the content of standard output is the content to be displayed in the browser. The output content of the first line is necessary, and it is also unique to a CGI program: printf(“%s%c%c”,”Content-Type:text/html”,13,10), this output is as HTML File header. Because CGI can not only output HTML text like a browser, but also output images, sounds and the like. This line tells the browser how to handle the received content. The definition of Content-Type is followed by two blank lines, which is also indispensable. Because the head output of all CGI programs is similar, a function can be defined for it to save programming time. This is a commonly used technique in CGI programming. The program later calls the library function getevn to get the content of QUERY_STRING, and then uses the sscanf function to fetch the value of each parameter. Note the usage of the sscanf function. There is nothing else, and it is no different from a normal C program. After compiling the program, rename it to mult.cgi and place it under the /cgi-bin/ directory, and it can be called by the form. In this way, a CGI program that processes GET forms is complete. POST form processing Let’s consider another form transmission method: POST. Suppose that the task we want to achieve is this: add a piece of text entered by the customer in the form to the back of a text file on the server. This can be seen as the prototype of a message board program. Obviously, this work cannot be achieved with a client-side script such as java script, and it can be regarded as a CGI program in the true sense. It seems that this problem is very similar to the content mentioned above, just using different forms and different scripts (programs). But in fact, there are some differences. In the above example, the GET processing method can be regarded as a “pure query” type, that is, it has nothing to do with the state. The same data can be submitted any number of times without causing any problems (except for some small overhead on the server). But now the task is different, at least it has to change the content of a file. Therefore, it can be said that it is state-related. This can be regarded as one of the differences between POST and GET. Moreover, GET has a limit on the length of the form, while POST is not. This is the main reason for choosing the POST method in this task. But relatively, the processing speed of GET is faster than POST. In the definition of CGI, for the POST type of form, its content is sent to the standard input of the CGI program (stdin in C language), and the transmitted length is placed in the environment variable CONTENT_LENGTH. So what we have to do is to read a string of CONTENT_LENGTH length in the standard input. Reading data from standard output sounds easier than reading data from environment variables, but it is not. There are some details to pay attention to, which can be seen in the following program. One thing to pay special attention to is: CGI programs are different from general programs. General programs will get an EOF mark after reading the contents of a file stream. But in the form processing process of CGI program, EOF will never appear, so do not read characters longer than the length of CONTENT_LENGTH, otherwise there will be any consequences, no one knows (there is no definition in the CGI specification, generally according to Different servers have different processing methods). Let’s take a look at how to collect data from the POST form to the CGI program. Here is a relatively simple C source code: Quote #include #include #define MAXLEN 80 #define EXTRA 5 /* 4 bytes are reserved for the field name “data”, 1 byte is reserved for “=” */ #define MAXINPUT MAXLEN+EXTRA+2 /* 1 byte is reserved for the newline character, and one is reserved for The following NULL */ #define DATAFILE “../data/data.txt” /* The file to be added data*/ void unencode(char *src, char *last, char *dest) {for(; src != last; src++, dest++) if(*src == “+”) *dest = “”; else if(*src == “%”) {int code; if(sscanf(src+1, “%2x”, &code) != 1) code = “?”; *dest = code; src +=2;} else *dest = *src; *dest = “”; *++dest = “”;} int main(void) {Char *lenstr; char input[MAXINPUT], data[MAXINPUT]; long len; printf(“%s%c%c “,”Content-Type:text/html;charset=gb2312”,13,10); printf (“< TITLE >Response< /TITLE> “); lenstr = getenv(“CONTENT_LENGTH”); if(lenstr == NULL || sscanf(lenstr,”%ld”,&len)!=1 || len> MAXLEN) printf(“< P >Form submission error”); else {FILE *f; fgets(input, len+1, stdin); unencode(input+EXTRA, input+len, data); f = fopen(DATAFILE, “a “); if(f == NULL) printf(“< P >Sorry, unexpected error, unable to save your data”); else fputs(data, f); fclose(f); printf(“< P >very Thank you, your data has been saved< BR >%s”,data);} return 0;}} Essentially, the program first obtains the word length of the data from the CONTENT_LENGTH environment variable, and then reads a string of the corresponding length. Because the data content is encoded during transmission, it must be decoded accordingly. The coding rules are very simple. The main ones are as follows: 1. Each field in the form is represented by the field name followed by an equal sign, and then the value of this field is connected, and the content between each field is represented by & link 2. All space symbols are replaced by plus signs, so it is illegal to appear spaces in the code segment; 3. Special characters such as punctuation marks, and some characters with specific meanings such as “+”, followed by a percent sign It is indicated by the corresponding ACSII code value. For example: If the user enters: Hello there! Then the data is encoded when it is transmitted to the server, and it becomes data=Hello+there%21. The unencode() function above is used to decode the encoded data. After the decoding is complete, the data is added to the end of the data.txt file and is displayed in the browser. After the file is compiled, rename it to collect.cgi and put it in the CGI directory to be called by the form. The corresponding form is given below:

Please enter your message (maximum 80 characters):
< INPUT NAME ="data" SIZE="60" MAXLENGTH="80" >< BR>

In fact, this program can only be used as an example, it cannot be formal in use. It misses a very critical problem: when multiple users write data to a file at the same time, errors will definitely occur. For such a program, the probability of files being written simultaneously is very high. Therefore, in a more formal message board program, some more considerations need to be made, such as adding a semaphore, or relying on a key file. Because that’s just a matter of programming skills, I won’t talk about it here. Finally, let’s write a CGI program that browses the data.txt file. This only needs to output the content to stdout: Quote #include #include #define DATAFILE “../ data/data.txt” int main(void) {FILE *f = fopen(DATAFILE,”r”); int ch; if(f == NULL) {printf(“%s%c%c “, “Content- Type:text/html;charset=gb2312”,13,10); printf(“< TITLE >error< /TITLE> “); printf(“< P >< EM >Unexpected error, unable to open file< /EM >” );} else {printf(“%s%c%c “, “Content-Type:text/plain”,13,10); while((ch=getc(f)) != EOF) putchar(ch); fclose(f);} return 0;} The only thing to note about this program is that it does not package data.txt into HTML format and then output it, but directly outputs it as plain text. It is enough to replace text/html with text/plain type in the header, and the browser will automatically select the corresponding processing method according to the type of Content-Type. To trigger this program is also very simple, because there is no data to enter, so you can do it with just one button:

< INPUT TYPE="SUBMIT" values ="Check">

At this point, some basic principles of writing CGI programs in C will be over. Of course, it is difficult to write a good CGI program based on these contents. This requires further study of the CGI specification and some other unique skills of CGI programming. The purpose of this article is to understand the concept of CGI programming. In fact, some of the current mainstream server-side scripting programming languages ​​such as ASP, PHP, JSP, etc., basically have most of the functions of CGI programming, but they are indeed better than CGI no matter what language is used. Programming is much easier. Therefore, when doing server-side programming, these script programming languages ​​are generally considered first. Only when they can’t solve it, such as when some more low-level programming is required, will CGI be used

Leave a Comment

Your email address will not be published.