Internet Publishing - PPI
"Materials used in this course are the property of the author. These
lessons may be used only by course participants for self-study purposes.
Application for permission to use these materials for other educational purposes
such as for teaching or as a basis for teaching should be directly submitted to
the author."
In this lesson, we will look at configuring a HTTP server. As examples, we will use win-httpd 1.4 and WebSite. Some webservers for Windows (for example, WebSite) have their own configuration programs. Therefore, the configuration processes can be very different, but the functions ought to be approximately the same.
For Windows 3.x users: Win-httpd is the windows version of NCSA's httpd for UNIX, so the configuration is almost the same for these two. Configuration is done by editing text files.
For Windows95 users: In WebSite, configuration is done by menus in the configuration program "WebSite Server Properties".
Even though you may not run your own server in the future, there are useful
things to know about the kinds of opportunities that exist so that you know what
you can ask of (or demand from) your distributor. If you are running the server
yourself, you have full control and a lot of flexibility. If you are not running
the server yourself, but only publishing pages on a commercial server (for
example, Compuserve or Eunet), you have less flexibility. Everything that has to
be done beyond publishing the actual piece, dead documents, must be done in
partnership with your distributor. This can seem like a hinderance, but it
doesn't need to be. Serious distributors have built up competence in this area
and are able to provide fast and helpful assistance.
We now find ourselves in Chapter 5 of the book.
Comparable information for httpd 1.4 can be found at:
on Win 3.x's
machine
on Per's practice
machine.
A little configuration information about WebSite can be found at:
on WebSite's server
on Per's practice machine
We will try to make the same (or almost the same) configuration on the two
webservers Win httpd14 and WebSite. The configurering process of the two servers
is very different. In this lesson, you will find some general comments on what
shall be configured as well as special paragraphs for each type of server where
how is taken up.
Many of the commands in the configuration files specify paths to
directories or files. These paths can be of two types:
Configuration files should be kept secret. Dishonest people may read information from the configuration files which give them access to backdoors and security loopholes. No one beside the one running the server needs to see these files, so don't make them accessible from the web tree ("better safe than sorry" principle.)
Configuration in win-httpd 1.4 is done by setting variables in different
configuration files. These are flat ASCII files which you edit with a
text editor (Notepad in Windows or edit in DOS). Using Word
would be unnecesary and laborious.
The files contain, at the start, just 7-bit characters (American ASCII). I
don't know if I am unnecessarily conservative, but try to avoid using special
national characters (In Norway we have the characters æøåÆØÅ
- I don't know how these characters appear in Greece or in The Nederlands.) Even
though one should in principle be able to write what one wishes in commentary
lines (lines that begin with #), one ought not take unnecessary chances. I also
use only 7-bit characters in filenames and directory names. ("He who laughs
last...")
There are four files which decide the configuration:
The following rules apply for all files:
When you start the server, all files are read into the memory of the
machine. If you make changes to the configuration files, you must stop the
server and reboot so that the changes come into effect.
Files and paths must be written as in UNIX, that is with a slash (/), and
not like in MS-DOS with a backslash (\).
The connection between virtual paths and physical paths exist with the
help of the alias directive in srm.cnf (see below).
In WebSite, there is a program called WebSite Server Properties with its own menu for configuring the server. In this program, virtual paths are written in UNIX format with "/", while the physical paths are written in MS-DOS format with "\".
If you have installed win-httpd in the standard manner, you should see the httpd.cnf file here. As you can see, it contains many comments. With "ServerAdmin" you can define the administrator of the server. If there is anything you don't understand or are unsure about, check the book or the manual at: /httpddoc/setup/httpd/Overview.html.
In this file, you change ServerAdmin so that your own e-mail address
is given in case of error messages.
In WebSite Server Properties, you choose the General "card"
where you can specify the e-mail address for the administrator.
ServerName may aslo be need to be changed. The machines I work with
are, as you probably already know, called pb1.idb.hist.no (httpd 1.4)
og pb.idb.hist.no (WebSite) . These are boring names for machines. In
addition, the names follow the machines. This means that if I get a new machine,
I will aslo get a new address. It would probably be wiser to tell the local name
service on the Internet (DNS - Domain Name Service) that pb1.idb.hist.no has an
alias which is, for example, ppi.idb.hist.no. Then, all requests from
the outside which are directed to ppi.idb.hist.no would be send to pb1.idb.hist.no.
In this same way, the users would be able to point to a web server with a more
logical name. You have certainly noticed that most of the serious web servers
have defined aliases which start with www.---.
Example: You probably accessed the lesson for this course from the
server
www.idb.hist.no. This is a UNIX machine which is actually called
astfgl.idb.hist.no. You can try using the latter variant just to see if
you get the same results.
Back to local configuration of your httpd. If there is an alias for a web
server in the Internet's name service, it is also appropriate that your httpd
uses this alias when it says who it is. That is, the ServerName ought to
be set to the same alias as is placed in the DNS (Domain Name Service). If no
alias has been created for your machine, you shall use the machine's own domain
name. You will find this in your machine's TCP/IP program set-up.
Those of you who are running a modem connection to the Internet will not,
as a rule, be given a permanent domain name. You can choose any name you wish as
you will be the only person with access to this server.
In WebSite Server Properties you perform the same operation under
the Identity "card". Here, you set in the domain name or IP
address for your machine.
NB! If you use a freestanding machine without a permanent IP number (for
example, you are connected via modem), you ought to use localhost in
place of the domain name.
All web servers generate log files of what happens on the server. We will
take a closer look at this when we come to server statistics.
For httpd 1.4, HTTPD.CNF will contain information about logging. Here, the
names of the other configuration files are also defined. Therefore, this is the
foundation for the configuration files (the mother of all configuration files).
For WebSite, the Website Server Properties program and the card
Logging is used to make changes the logs configuration.
In the beginning, there was 7-bit ASCII. E-mail was used to send text messages written in American English. Everything was fine. Then, people began sending foreign characters such as æøåÆØÅ, the demand for knowledge increase, multimedia came into being and people wanted to send both apples and SNAKES over the Net. The solution was MIME: Multimedia Internet Mail Exchange. MIME types describe what is being sent and how it is encoded. MIME codes make it possible for the client to start a program which can display the transferred file. A sound bite can be run on a sound program, etc.
In the file mime.typ a
connection is created, between file ending (the last part of the file name) and
MIME type -- for example, files which end in .txt of the type plain/text.
The most basic MIME type is application/octet-stream. If a client
receives a file of this type, it can't display it in any way. It normally asks
the user where to save it. In my mime.typ I have added some file extenstions of
this type so that you would be able to download these file types from my server.
application/octet-stream bin exe dll
On the client, you are able to create a compatible connection between last
name and viewer.
Read more about this in the win-httpd manual:
/httpddoc/setup/typesfor.html
Now, we will take a look at configuring the documents' location on the
harddisk in relation to the URLs (virtual paths) which the clients use to get to
the files.
It is not always desirable to place all doucments in one
directory, for example, c:/httpd/htdocs. It is desirable to have a
set-up which is independent of which server program is used. One year I may
decide run another web server which doesn't have a directory called
httpd/htdocs. It is also easier to keep my files separated from the web server's
with regard to back-ups.
In the file srm.cnf important
things are defined, such as DocumentRoot and Alias (and a few
other things).
DocumentRoot tells the server what the root of your web tree is. I have
not changed this from the standard set-up. Instead, I have used
alias to explain where my web files can be found:
Alias /~per/ c:/usr/www/ Alias /files/ c:/usr/files/
The alias command is written as Alias fakename realname. Fakename is
the virtual path which the customer gives as the URL. Realname is the
absolute physical path. With the help of the alias command, you can take
the physical tree which lies on your harddrive and cut it up and graft it
together again in a virtual tree which becomes your web.
Redirect is used to give a message that your document has moved.
In WebSite Server Properties you choose the Mapping card for
so-called "mapping" functions. Here, you can define connections
between virtual and physical paths. The "Document Root" or URL is also
defined.
Example: Fredrik had an idea which he described in the file didrik.html.
Later on, the idea became a project, and it was wise to move the file to a
project directory, /prosj/didrik/. Therefore, he created an element such
as the following:
Redirect /~fredrik/ideas/didrik.html http://pc130.idb.hist.no/~fredrik/proj/didrik/didrik.html
If someone asks to see didrik.html under ideas, a message would be sent back saying that the file has moved to a new address. The client automatically picks up this message and asks to be sent to the new address. All this occurs behind the scenes. The user doesn't see these messages which are sent to and fro.
If you name an URL which is a pointer to a directory, that is, the
URL ends with a slash (/), and an index.htm or index.html file
can't be found in this directory, the web server will generate a html document
about the contents of the directory. (The document will not be saved, but will
be generated each time.) You have a lot of leeway with regard to what this
document will look like.
I have a directory on the httpd 1.4 server where this is illustrated:
httpd 1.4 files.
On the WebSite server, you will find an example at:
WebSite files.
If you have a file called #readme.htm or #readme.txt, it
will be included in the document. That is, this file will exist as a file in the
file listing and its contents will be sent to print on the client. In
this way, you can describe the contents of the directory for the users. If you
want this file to be called something else besides
#readme, you can use the directive ReadmeName in the local
srm.ctl file.
You can edit the name of what shall become the standard index of the
directories. The most usual is one called index.html on UNIX machines
and index.htm for MS-DOS/Windows machines. A problem arises when these
two worlds meet. If we have a tree (or a forest) of web documents which are
visible both from MS-DOS/Novell and UNIX and both UNIX and MS-DOS servers are
running, problems arise. We can configure the servers to recognize both .htm and
.html as html documents, but we can only ask them to recognize either
index.htm or index.html as the standard file for a directory. Since
MS-DOS can't teach itself to see index.html, the solution must be to use
index.htm on both systems. However, it is difficult to get UNIX people to limit
themselves to a MS-DOS name. The best solution is perhaps to use a new neutral
name which everyone can accept, for example, index. (Since MS-DOS is now
considered dead due to the introduction of Windows95, this discussion is perhaps
meaningless.)
The automatic index will go into the individual html files and pick out <TITLE>the
title of the document</TITLE>. There is a bug in win-http: TITLE tags
must be written in uppercase letters. I have reported this defect, but have
only received a semi-automatic answer back.
If there are files which are not html files, you must write a description
of them and create the file #haccess.ctl in the same directory as the
file. An element in my #haccess.ctl file from the "/files/"
directory looks something like this:
AddDescription `PaintShopPro for MSWindows 3.x` psp311.zip
Make sure that ` point in the right direction. It should also be possible to use regular quotation marks (").
In WebSite Server Properties choose the Dir Listing "card"
in order to configure this function. You can define names for the header, footer
and file description files. All the files begin with the #-sign so that they
will not appear in the file listing, for example, #header.html, #footer.html and
#filedesc.cnf. Try WebSite Files or
the demopage at Per's
Documentation.
The format for file descriptions is:
(space) comments filename | description filename | description
Earlier, I wrote about the different levels of web servers. I have now realized that one needs perhaps 4 levels:
It's not easy to know what shall be placed in these categories. One must pay
attention to the needs of both oneself and the readers. If you have written down
some thoughts about whatever, and the notes lay unstrcutred and in a jumble with
gaps in clarity and large holes, a reader will need to spend a long time finding
out what your message is. It costs the reader a lot of time to get meaning from
the text. This is a form of pollution. This type of pollution is much more
visible with regard to e-mail than on the WWW. A person who sends out a draft to
all employees, costs the company many work hours and the usefulness is meager. A
much bigger document which contains irrelevant information (for example, an
invitation to the company soccer match) will, if it is clearly set-up and easily
identifiable from the beginning, only be read by those who are interested.
Everyone else will delete it immediately.
This holds true for the Web as well. If you have something to share, share
it, but let the status of the document be very clear and let the document
identify itself. (A document which contains a lot of misspellings, lacks an
introduction and formal idenfication of the author, etc. should be considered a
working document.) If you do not want to share it, hide it. This can be done by
limiting access to parts of the web tree by configuring the server.
There are two methods for controlling access to information:
Example: On Per's practice machine, there is a directory which can only be
read from machines with the domain name ending with "idb.hist.no" (the
domain name for my department). Try
this. If you are using a machine outside my Department of Computer
Engineering, you will be sent a message saying that the area requested is
restricted. Both personal control and address control are
connected to the directories on the web server. All files in the same directory
will, therefore, have the same control. The control mechanism is based on two
criteria:
Notice that both 1 and 2 are used, and that the order can be configured.
Example: Below is an image for configuring WebSite. The example
illustrates address control.
Pay special attention to those who have permission to read can be
specified by both IP number (those beginning with 199.182) or with the domain
name (those ending in idb.hist.no). If domain names are used, the web server
must be set up to search DNS, which isn't always desirable. Therefore, the most
effective way to control accesss would be to use IP numbers.
In httpd 1.4, there is a global
access.cnf file which configures
the global access to the server. In addition, we can have local #haccess.ctl
files in each directory. We shall, at this time, only look at how you can limit
access to a few directories to the users registered on your machine and who have
passwords.
I have a branch on the web tree of the practice PC pb1.idb.hist.no
with notes which are password protected. You can
try this directory by using
the userid poi and the password mecpol .
Here is the method for creating protection: (The file #haccess.ctl
was created in this directory with the following contents.)
AuthUserFile c:/httpd/conf/lovbruk.pwd AuthGroupFile c:/httpd/conf/empty.pwd AuthName Examples AuthType Basic <Limit GET> require user poi </Limit>
AuthUserFile is the physical path to a file which contains an overview of users and passwords. It looks like this:
poi:Tx3uDv;2Io*Gc
Yes, this is an authentic image! If some of you break the code,
let me know. This is the type of
information which one usually doesn't want to make public. It contains the
userid (ppi) and the password (encrypted). How did I make this file? With
winhttpd; a program called htpasswd.exe is included.
The User's Manual for the program is at
/httpddoc/setup/admin/UserManagement.html.
This a MS-DOS program, and the file is created by using the command:
htpasswd -c C:\httpd\conf\lovbruk.pwd poi
When the program is run, it will generate a new AuthUserFile
(password file) if the -c option is included. If you are going to add an
existing password file, you don't use the -c option. After that, follows the
path and filename for the password file, and finally the userid for which you
will set up passwords.
When running, the password must be given twice. The program will now place
an encrypted version of the password in the file, as shown above.
This access control applies for the directory where the #haccess.ctl file
lies and the entire tree underneath it. However, it is the physical tree that we
are talking about. If you have grafted together a virtual tree consisting of
many physical branches, each branch must be protected separately.
WebSite users can define users and passwords via WebSite Server
Properties and the "cards" Users and Access Control.
See the example above. The configuration program creates the necessary files for
you.
The answer to this exercise shall be:
This assignment is due 29 april 1997.