|
Intro to the Web Application Development Environment
|
|
|
Previous |
Table of Contents
I suppose the most basic part of a web application is the data itself.
As I said earlier, all web applications
allow a user to submit instructions on how the web application should
massage a bit of data on the web server. This might involve searching a
database, creating a shopping cart of products, or emailing some
information to someone.
Regardless, the data that is being manipulated all have some basic characteristics.
- Data have values
- Data have types
- Data have descriptions
- Data have formats
Data Values
To say that data have values may seem a bit obvious at first. After all, what is data if not a representation of something? That is, if there is no value, there is no data right? Without a value there is just a blank page or an empty database.
Well okay, perhaps the statement is a bit obvious when taken at face value.
What is crucial to understand however, is the deeper significance of the statement. That is, data, to be useful, must have value to the consumer. It must be "information".
It cannot be said enough that on the web and within web applications, content is king. It is crucial that whatever technology you use for storing, describing, searching, or modifying your data, that that technology helps you to create information.
Data Types
Data types represent ways of categorizing different bits of data. At a most basic level, you can imagine two types of data: letters and numbers. Like any methodology for categorizing things, 'typing' your data helps make sense of it. In some cases, it also helps you store it more efficiently.
Actual data types include more useful categories (from a programmers perspective) such as dates, ints, floats. strings. For example, the moment you tell a program that a set of data consists of numbers and not just strings, the program knows it can perform numeric operations. For example, if the program is an accounting package, when it knows that the accounts store numeric data representing money, then debits and credits may be performed on the account data using standard number-based operations like addition or subtraction.
Data Descriptions
As anyone knows, one of the biggest bummers about the information age is the fact that it tends to create information glut rather than information itself. That is, quickly data overwhelms the users ability to make sense of it and the consumer is buried.
One way that data harvesters help to solve this problem is to include meta descriptions which help consumers quickly place data into categories that can be filed away so that they may be found more quickly on the basis of the clues that the meta descriptions provide.
One example of a meta description are cross reference keywords. If you have a title of a book in a database, you might consider also applying keywords to the title that would help it match a search for the data even if the exact word is not used.
Consider a possible query a person might want to make. Let's say that person wants to find all the books on cooking in a web store. The truth of the matter is that many books on cooking do not have the word "Cooking" actually in the title. So a title like "Scandanavian Cuisine" would have keywords such as "Cooking" applied to the data. Thus, providing keywords that match the title is an appropriate piece of information to store because it makes searches on the database more useful.
Data Formats
Finally, data may be encapsulated within a format that helps the consumer understand how to display it. Formats continue to define the data beyond the actual data type we described earlier. A data type tells a program what type of data something is so that basic functional operations may be performed such as adding a day to a date or adding a list of numbers together.
A format determines how that representative data type should display itself. For example, if a number is represented as Money in US Dollars, the format might be coded in such a way as to include a $ symbol in front with two decimal places for the cents in the dollars and cents that make up a US dollar figure. Likewise, in Europe, dates tend to be formatted as DD/MM/YYYY format whereas the same data might be displayed in MM/DD/YYYY format in the United States.
Summary of Data Characteristics
All of the web technologies at the Data Layer define, describe, or standardize one or more of these characteristics.
In the next few weeks we will review some of the most important data layer technologies to give you a sense of how they fit into the bigger picture. We will begin with raw data, move into database, add a section on HTML and conclude with an overview of XML.
We begin with a discussion of raw data.
|
|
Raw data is perhaps the most basic form of data that you will come
across when designing web applications, but is often the best choice.
Typically, you will see raw data as delimited rows such as
id,fname,lname,address,phone
id,fname,lname,address,phone
id,fname,lname,address,phone
id,fname,lname,address,phone
id,fname,lname,address,phone
id,fname,lname,address,phone
Raw data is easy to parse, easy to access and easy to write to.
Usually, data will be stored in some data file on the same file system
as your web application. Accessing data is as simple as opening,
reading, writing and closing local files. Unfortunately, as you will
see in the next section, raw data is hard to maintain and manage.
|
|
Once upon a time, in the primitive and
barbarian days before computers, the amount of
information shepherded by a group of people could be collected
in the wisdom and the stories of its older members. In this
world, storytellers, magicians, and grandparents were
considered great and honored storehouses for all that was
known.
Apparently, and according
to vast archeological data, campfires were used (like
command-line middleware) by the younger members of the
community to access the information stored in the minds of the
elders using API's such as
public String TellUsAboutTheTimeWhen(String
s);.
And then of course, like a sweeping and
rapidly-encompassing viral infection, came agriculture,
over-production of foodstuffs, and the origins of modern-day
commerce.
Dealing with vast storehouses of wheat, rice,
and maize became quite a chore for the monarchs and emperors
that developed along with the new economy. There was simply too
much data to be managed in the minds of the elders (who by now
were feeling the effects of hardware obsolescence as they
were being pushed quietly into the background).
And so, in order to store all the new
information, humanity invented the technology
of writing. And though
great scholars like Aristotle warned that the invention
of the alphabet would lead to the subtle but total demise of
the creativity and sensibility of humanity, data began to be
stored in voluminous data repositories, called books.
As we know, eventually books
propogated with great speed and soon, whole
communities of books migrated to the first real "databases",
libraries.
Unlike previous versions of data
warehouses (people and books), that might be considered
the australopithecines of the database lineage, libraries crossed over into
the modern-day species, though they were incredibly
primitive of course.
Specifically, libraries introduced
"standards" by which data could be stored and retrieved.
After all,
without standards for accessing data, libraries would be
like my closet, endless and engulfing swarms of chaos. Books,
and the data within books, had to be quickly accessible by
anyone if they were to be useful.
In fact, the usefulness of a library, or any base
of data, is proportional to its data storage and retrieval efficiency. This
one corollary would drive the evolution of databases over the next 2000
years to its current state.
Thus, early librarians defined standardized
filing and retrieval protocols. Perhaps, if you have ever made
it off the web, you will have seen an old library with its cute
little indexing system (card catalog) and pointers (Dewey decimal
system).
And for the next couple thousand years
libraries grew, and grew, and grew along with associated
storage/retrieval technologies such as the filing cabinet,
colored tabs, and three ring binders.
All this until one day about half a
century ago, some really bright folks working for the British
government were asked to invent an advanced tool for breaking
German cryptographic codes and aiming missiles.
That day the world changed again. That
day the computer was born.
The computer was an intensely
revolutionary technology of course, but as with any technology, people
took it and applied it to old problems instead of using
it to its revolutionary potential.
Almost instantly, the
computer was applied to the age-old problem of
information storage and retrieval. After all, by World War Two,
information was already accumulating at rates beyond the
space available in publicly supported libraries. And besides,
it seemed somehow cheap and tawdry to store the entire
archives of "The Three Stooges" in the Library of Congress.
Information was seeping out of every crack and pore of
modern day society.
Thus, the first attempts at information
storage and retrieval followed traditional lines and metaphors.
The first systems were based on discrete files in a virtual
library. In this file-oriented system, a bunch of
files would be stored on a computer and could be accessed by
a computer operator. Files of archived data were called "tables"
because they looked like tables used in traditional file keeping.
Rows in the table were called "records" and columns were
called "fields".
Consider the following example:
First Name |
Last Name |
Email |
Phone |
Eric |
Tachibana |
erict@eff.org |
213-456-0987 |
Selena |
Sol |
selena@eff.org |
987-765-4321 |
Li Hsien |
Lim |
hsien@somedomain.com |
65-777-9876 |
Jordan |
Ramacciato |
nadroj@otherdomain.com |
222-3456-123 |
The "flat file" system was a start.
However, it was seriously inefficient.
Essentially, in
order to find a record, someone would have to read
through the entire file and hope it was not the last record.
With a hundred thousands records, you can imagine the dilemma.
What was needed, computer scientists thought
(using existing metaphors again) was a card catalog, a means to achieve random
access processing, that is the ability to efficiently access a
single record without searching the entire file to find it.
The result was the indexed file-oriented system in which
a single index file stored "key" words and pointers to records that were stored elsewhere.
This made retrieval much more efficient. It worked just like a card catalog
in a library. To find data, one needed only search for keys rather than
reading entire records.
However, even with the benefits of indexing, the
file-oriented system still suffered from problems including:
- Data Redundancy - the same data might be stored
in different places
- Poor Data Control - redundant data might be
slightly different such as in the case when Ms. Jones changes her
name to Mrs. Johnson and the change is only reflected in some of
the files containing her data
- Inability to Easily Manipulate Data - it was a
tedious and error prone activity to modify files by hand
- Cryptic Work Flows - accessing
the data could take excessive programming effort and was too
difficult for real-users (as opposed to programmers).
Consider how troublesome the following data file would be to
maintain.
Name |
Address |
Course |
Grade |
Mr. Eric Tachibana |
123 Kensigton |
Chemistry 102 |
C+ |
Mr. Eric Tachibana |
123 Kensigton |
Chinese 3 |
A |
Mr. Eric Tachibana |
122 Kensigton |
Data Structures |
B |
Mr. Eric Tachibana |
123 Kensigton |
English 101 |
A |
Ms. Tonya Lippert |
88 West 1st St. |
Psychology 101 |
A |
Mrs. Tonya Ducovney |
100 Capitol Ln. |
Psychology 102 |
A |
Ms. Tonya Lippert |
88 West 1st St. |
Human Cultures |
A |
Ms. Tonya Lippert |
88 West 1st St. |
European Governments |
A |
What was needed was a truly unique way to deal
with the age-old problem, a way that reflected the medium of
the computer rather than the tools and metaphors it was replacing.
Enter the database.
Simply put, a database is a computerized
record keeping system. More completely, it is a system
involving data, the hardware that physically stores that data,
the software that utilizes the hardware's file system
in order to 1) store the data and 2) provide
a standardized method for retrieving or changing the
data, and finally, the users who turn the data into
information.
Databases, another creature of the 60s,
were created to solve the problems with file-oriented systems
in that they were compact, fast, easy to use, current, accurate,
allowed the easy sharing of data between multiple users, and
were secure.
A database might be as complex and
demanding as an account tracking system used by a bank to
manage the constantly changing accounts of thousands of
bank customers, or it could be as simple as a collection
of electronic business cards on your laptop.
The important thing is that a database
allows you to store data and get it or modify it when you
need to easily and efficiently regardless of the amount
of data being manipulated. What the data is and how demanding
you will be when retrieving and modifying that data is simply
a matter of scale.
Traditionally, databases ran on large,
powerful mainframes for business applications. You will
probably have heard of such packages as
Oracle 8 or
Sybase SQL Server
for example.
However with the advent of
small, powerful personal computers, databases have become
more readily usable by the average computer user.
Microsoft's Access and
Inprise's (formerly Borland's) Paradox
are two popular PC-based engines around.
More importantly for our focus,
databases have quickly become integral to the design,
development, and services offered by web sites.
Consider
a site like
Amazon.com
that must be able to allow users
to quickly jump through a vast virtual warehouse of
books and compact disks.
How could Amazon.com create web
pages for every single item in their inventory and how could
they keep all those pages up to date. Well the answer
is that their web pages are created on-the-fly by a program
that "queries" a database of inventory items and produces
an HTML page based on the results of that query.
For more information, check out my Introduction to Databases for
Web Developers
|
|
HTML, as its name implies, is a markup language. As such, it is used to
markup text. But what exactly does it mean to markup text?
Abstractly, marking up text is a methodology for
encoding data with information about itself. Examples of
markups (encoded data) are ubiquitous in the real world.
For example, back when you were slogging through high school,
you probably used to use a bright yellow highlighter pen to highlight
sentences in your schoolbooks (or at last you knew someone who did!).
You did so because you thought that the highlighted sentences would
be useful to review around exam time and you wanted a quick way to skim
through the important points. Just like you, thousands of
kids around the world did the exact same thing for the exact same reason.
By highlighting certain bits of text, you were effectively "marking-up" the data.
Essentially, you specified that certain sentences (data) were important
by marking them in yellow. These sentences became encoded with the
fact that they were important.
And what's more, since everyone followed the same
standard of marking up, you could easily pick up a used text book and
get a good idea just from reading the highlighted sections what were
core points of the book.
There are two crucial points to take away from this example. For
markups to transmit useful information about data to a pool of users...
- a standard must be in place to define what a valid markup is -
In the example above, markup is defined as a bit of yellow
ink atop text. In HTML a markup is a tag.
- a standard must be in place to define what markup means -
In the example above, a yellow highlight means the highlighted
text represents an important point. In HTML each tag communicates
its own layout of formatting meaning.
Markups are also ubiquitous in the world of computers. They are used by word processors to specify formatting and layout, by communications programs to express the meaning of data sent over the wires, by database applications that must associate meaning and relationships with the data they serve, and by multimedia processing programs which must express meta-data about images or sound.
As data is sent through dumb computers and programs, it is essential that the data carries with it information necessary to communicate what the data means
and/or what the receiver should do with that data.
Data with no context is meaningless just as an unhighlighted
book is bad news around exam time!
HTML is one of the more famous computer markup systems. HTML defines a set of tags that associate formatting rules with bits of text. Documents which have been marked up (which contain plain text as well as the tags that specify
the rules for formatting that text) are read by an HTML processing
application (a web browser for example) that knows how to display the
text according to the rules.
For example, the <B> tag specifies a rule which instructs an HTML
processing application to bold a specific bit of text. Similarly, the
<CENTER> tag instructs the HTML processing application to center
the text.
Thus <CENTER><B>BOLD</B></CENTER>
would be displayed by an HTML processing application as
BOLD
You might imagine a client contact list which could look like the
following bit of HTML code:
<UL>
<LI>Gunther Birznieks
<UL>
<LI>Client ID: 001
<LI>Company: Bob's Fish Store
<LI>Email: gunther@bobsfishstore.com
<LI>Phone: 662-9999
<LI>Street Address: 1234 4th St.
<LI>City: New York
<LI>State: New York
<LI>Zip: 10024
</UL>
<LI>Susan Czigonu
<UL>
<LI>Client ID: 002
<LI>Company: Netscape
<LI>Email: susan@eudora.org
<LI>Phone: 555-1234
<LI>Street Address: 9876 Hazen Blvd.
<LI>City: San Jose
<LI>State: California
<LI>Zip: 90034
</UL>
</UL>
The above HTML-encoded data would be displayed by an HTML processing application as:
- Gunther Birznieks
- Client ID: 001
- Company: Bob's Fish Store
- Email: gunther@bobsfishstore.com
- Phone: 662-9999
- Street Address: 1234 4th St.
- City: New York
- State: New York
- Zip: 10024
- Susan Czigonu
- Client ID: 002
- Company: Netscape
- Email: susan@eudora.org
- Phone: 555-1234
- Street Address: 9876 Hazen Blvd.
- City: San Jose
- State: California
- Zip: 90034
Is HTML a Programming Language?
Actually, though HTML is often called a programming language it is really
not. Programming languages are 'Turing-complete', or
'computable'. That is, programming languages can be used to
compute something such as the square root of pi or some other
such task. Typically programming languages use conditional
branches and loops and operate on data contained in abstract
data structures. HTML is much easier than all of that. HTML is
simply a 'markup language' used to define a logical structure
rather than compute anything. It is sort've a semantic issue,
but it is one which you should officially be aware of. |
The language itself is fairly simple and follows a few
important standards.
Firstly, document description is defined by "HTML tags" that are
instructions embedded within a less-than (<) and a greater-than
(>) sign. To begin formatting, you specify a format type within
the < and the >. Most tags in HTML are ended with a similar
tag with a slash in it to specify an end to the formatting. For
example, to emphasize some text, you would use the following HTML
code:
this text is not bold
<EM>this text is bold</EM>
this text is not bold
It is important to note that the formatting codes within an HTML
tag are case-insensitive. Thus, the following two versions of the bold
tag would both be understood by a web browser:
<em>this text is bold</em>
this text is not
<EM>this text is bold</EM>
You can also compound formatting styles together in HTML. However,
you should be very careful to "nest" your code correctly. For example,
the following HTML code shows correct and incorrect nesting:
<CENTER><EM>this text is bolded and centered
correctly</EM></CENTER>
<EM><CENTER>this text is bolded and centered
incorrectly</EM></CENTER>
In the incorrect version, notice that the bold tag was closed
before the center tag, even though the bold tag was opened first. The
general rule is that tags on the inside should be closed before tags on
the outside.
Finally, HTML tags can not only define a formatting option, they can
also define attributes to those options as well. To do so, you specify
an attribute and an attribute value within the HTML tag. For example,
the following tag creates a heading style aligned to the left:
<H2 ALIGN = "LEFT">this text has a heading
level two style and is
aligned to the left </H2>
There are a few things to note about attributes however. First, it
is not necessary to enclose attribute values within quotes unless
white space is included in the value. Secondly, it is not necessary to
have a space before or after the equal sign that matches an attribute
to its value. Finally, when you close an HTML tag with an attribute,
you should not include attribute information in the closing.
Finally, you should know that web browsers do not care about
white space that you use in your HTML document. For example, the
following two bits of HTML will be displayed the exact same way:
This is some text that is displayed
as you would expect
This is some text
that is displayed in a way
you
would not expect:
exactly the same as the above
|
|
Like HTML, XML (also known as Extensible Markup Language) is a markup
language which relies on the concept of rule-specifying tags and the
use of a tag-processing application that knows how
to deal with the tags.
"The correct title of this specification, and the correct full name of XML, is "Extensible Markup
Language". "eXtensible Markup Language" is just a spelling error. However, the abbreviation "XML" is
not only correct but, appearing as it does in the title of the specification, an official name of the Extensible
Markup Language.
The name and abbreviation were invented by James Clark; other options under consideration had
included MGML, (Minimal Generalized Markup Language), MAGMA (Minimal Architecture For Generalized Markup Applications), and SLIM (Structured Language for Internet Markup)" - Extensible Markup Language (XML) 1.0 Specs, The Annotated Version.
|
However, XML is far more powerful than HTML.
This is because of the "X". XML is "eXtensible". Specifically, rather
than providing a set of pre-defined tags, as in the case of HTML, XML
specifies the standards with which you can define your own markup
languages with their own sets of tags. XML is a meta-markup language
which allows you to define an infinite number of markup languages
based upon the standards defined by XML.
"The design goals for XML are:
- XML shall be straightforwardly usable over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs which process XML documents.
- The number of optional features in XML is to be kept to the absolute
minimum, ideally zero.
- XML documents should be human-legible and reasonably clear.
- The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance."
- Extensible Markup Language (XML) 1.0 Specs, The Annotated Version.
|
Let's consider a very simple example. Let's create a new markup
language called SCLML (Selena's Client List Markup Language). This
language will define tags to represent contact people and information about
contact people.
The set of tags will be simple. However, they will be expressive. Unlike
<UL> and <LI> XML tags can be immediately understood just by
reading the document.
<CONTACT>
<NAME>Gunther Birznieks</NAME>
<ID>001</ID>
<COMPANY>Bob's Fish Store</COMPANY>
<EMAIL>gunther@bobsfishstore.com</EMAIL>
<PHONE>662-9999</PHONE>
<STREET>1234 4th St.</STREET>
<CITY>New York</CITY>
<STATE>New York</STATE>
<ZIP>Zip: 10024</ZIP>
</CONTACT>
<CONTACT>
<NAME>Susan Czigonu</NAME>
<ID>002</ID>
<COMPANY>Netscape</COMPANY>
<EMAIL>susan@eudora.org</EMAIL>
<PHONE>555-1234</PHONE>
<STREET>9876 Hazen Blvd.</STREET>
<CITY>San Jose</CITY>
<STATE>California</STATE>
<ZIP>90034</ZIP>
</CONTACT>
Note that the use of XML is not limited to text markup. The very extensibility of XML means that it could just as easily be applied to sound markup or image markup. A tag such as <EMPHASIZE> might be displayed textualy as being bold but audibly as a louder voice!
|
What you see above is a very simple "XML document". As you can see, it looks pretty similar to an HTML document.
But don't forget, as we said before, it is not enough to simply encode
(markup) the data. For the data to be decoded by someone or something
else, the encoding markup languages must follow standard rules
including:
- The syntax for marking up
- The meaning behind the markup
In other words, a processing application must know what a valid
markup is (perhaps a tag) and what to do with it if it is valid?
After all, how would Netscape
know what to do with the above document? What in the world is a
<PHONE> tag? Is it a legal tag? How should it be displayed? Our markup language must somehow communicate the
syntax of the markup so that the processing application will know
what to do with it.
In XML, the definition of a valid markup is handled by a
Document Type Definition (DTD) which communicates the structure of
the markup language. The DTD specifies what it means to be a valid
tag (the syntax for marking up).
We'll discuss the details of DTDs later. For now, just get comfortable
with the idea of a DTD as a separate component to the equation.
Yet we must also communicate the meaning of the markup as well as the
syntax.
To specify what valid tags mean, XML documents are also associated with
"style sheets" which provide GUI
instructions for a processing application like a web browser. A style
sheet, the details of which we will discuss later, might specify display
instructions such as:
- Anytime you see a <CONTACT>, display it using a <UL>
tag. Similarly, </CONTACT> tags should be converted to </UL>
- All <NAME> tags can be substituted for <LI> tags and
</NAME> tags should be ignored.
- All <EMAIL> tags can be substituted for <LI> tags and
</EMAIL> tags should be ignored.
etc.....
In this example, the style sheet utilizes the functionality of HTML to
define the formatting of SCLML. But if the XML document was being processed by a program other than a web browser, the HTML translation step might be bypassed.
Processing applications combine the logic of the style sheet,
the DTD, and the data of the SCLML document and display it according to
the rules and the data.
But wait, isn't this quite complex? Now instead of a single HTML
document which defines the data and the rules to display the data, we
have an SCLML document, a DTD, AND a style sheet. That's three pieces
as opposed to just one.
Further, we need a processing agent that can do the work of putting the
DTD, style sheet, and SCLML document together. Remember, web browsers
are made to read a specific markup language (like HTML), not any
markup language. That means we have three documents to pull together plus one processing program to write or buy. What a mess.
Actually however, though there are a few more hurdles to jump in order to
use XML, there are several reasons why all this is worth it. Let's take a
look at them. . . .
|
|
 |
Advantages of XML: Breaking the Tag Monopoly
|
 |
|
|
|
The first benefit of XML is that because you are writing your own markup
language, you are not restricted to a limited set of tags defined by
proprietary vendors.
Rather than waiting for standards bodies to adopt tag
set enhancements (a process which can take quite some time), or for
browser companies to adopt each other's standards (yeah right!),
with XML, you can create your own set of tags at your own pace.
Of course, not only are you free to develop at your own pace,
but you are free to develop tools that meet your needs exactly.
By defining your own tags, you create the markup language in terms of
your specific problem set! Rather than relying on a generic set of
tags which suits everyone's needs adequately, XML allows
every person/organization to build their own tag library which
suits their needs perfectly.
"From the earliest days of the Web, we've been using essentially the same set of tags in our documents....There's a significant benefit to a fixed tag set with fixed semantics: portability. However, HTML is very confining. Web designers want more control over presentation. Enter XML" - Norman Walsh
|
That is, though the majority of web designers do not need tags to
format musical notation, medical formula, or architectural
specifications, musicians, doctors and architects might.
XML allows each specific industry to develop its own tag sets to meet
its unique needs without forcing everyone's browser to incorporate the
functionality of zillions of tag sets, and without forcing the
developers to settle for a generic tag set that is too generic
to be useful.
Check out these customized XML-based languages:
|
|
|
 |
Advantages of XML: Moving Beyond Format
|
 |
|
|
|
However cool the idea of escaping the limitations of
a basic tag set (like HTML) sounds, it isn't even close
to the best thing about XML?
The real power of XML comes from the fact that with XML, not only
can you define your own set of tags, but the rules specified by
those tags need not be limited to formatting rules. XML allows you
to define all sorts of tags with all sorts of rules, such as tags
representing business rules or tags representing data description
or data relationships.
Consider again the case of the contact list in SCLML.
Using standard HTML, a developer might use something like the following:
<UL>
<LI>Gunther Birznieks
<UL>
<LI>Client ID: 001
<LI>Company: Bob's Fish Store
<LI>Email: gunther@bobsfishstore.com
<LI>Phone: 662-9999
<LI>Street Address: 1234 4th St.
<LI>City: New York
<LI>State: New York
<LI>Zip: 10024
</UL>
<LI>Susan Czigonu
<UL>
<LI>Client ID: 002
<LI>Company: Netscape
<LI>Email: susan@eudora.org
<LI>Phone: 555-1234
<LI>Street Address: 9876 Hazen Blvd.
<LI>City: San Jose
<LI>State: California
<LI>Zip: 90034
</UL>
</UL>
While this may be an acceptable way to store and display
your data, it is hardly the most efficient or powerful. As you are
probably aware, there are many potential problems associated with
marking up your data using HTML. Three particularly serious problems
come to mind:
- The GUI is embedded in the data. What happens if
you decide that you like a table-based presentation better than a
list-based presentation? In order to change to a table-based
presentation, you must recode all your HTML! This could mean editing
many of pages.
- Searching for information in the data is tough. How would you get
a quick list of only the clients in California? Certainly, some
type of script would be necessary. But how would that script work? It
would probably have to search through the file word for word looking
for the string "California". And even if it found matches, it
would have no way of knowing that California might have a relationship
to "New York" - that they are both states. Forget about the
relationships between pieces of data which are crucial to power
searching.
- The data is tied to the logic and language of HTML. What happens
if you want to present your data in a Java applet? Well,
unfortunately, your Java applet would have to parse through the HTML
document stripping out tags and reformat the data. Non-HTML
processing applications should not be burdened with extraneous work.
With XML, these problems and similar problems are solved. In XML, the
same page would look like the following:
<CLIENT>
<NAME>Gunther Birznieks</NAME>
<ID>001</ID>
<COMPANY>Bob's Fish Store</COMPANY>
<EMAIL>gunther@bobsfishstore.com</EMAIL>
<PHONE>662-9999</PHONE>
<STREET>1234 4th St.</STREET>
<CITY>New York</CITY>
<STATE>New York</STATE>
<ZIP>Zip: 10024</ZIP>
</CLIENT>
<CLIENT>
<NAME>Susan Czigonu</NAME>
<ID>002</ID>
<COMPANY>Netscape</COMPANY>
<EMAIL>susan@eudora.org</EMAIL>
<PHONE>555-1234</PHONE>
<STREET>9876 Hazen Blvd.</STREET>
<CITY>San Jose</CITY>
<STATE>California</STATE>
<ZIP>90034</ZIP>
</CLIENT>
As you can see, custom tags are used to bring meaning to the data being
displayed. When stored this way, data becomes extremely portable
because it carries with it its description rather than its display.
Display is "extracted" from the data and as we will see later,
incorporated into a "style sheet".
Let's consider some of the benefits.
- With XML, the GUI is extracted. Thus, changes to display
do not require futzing with the data. Instead, a separate
style sheet will specify a table display or a list display.
- Searching the data is easy and efficient. Search engines can simply
parse the description-bearing tags rather than muddling in the data.
Tags provide the search engines with the intelligence they lack.
- Complex relationships like trees and inheritance can be
communicated.
- The code is much more legible to a person coming into the environment
with no prior knowledge. In the above example, it is obvious that
<ID>002</ID> represents an ID whereas <LI>002 might
not. XML is self-describing.
|
|
However, awesome XML is, there are some drawbacks which have hindered
it from gaining widespread use since its inception. Let's look at the
biggest drawback: The lack of adequate processing applications.
For one, XML requires a processing application. That is, the nice thing
about HTML was that you knew that if you wrote an HTML document, anyone,
anywhere in the world, could read your document using Netscape. Well,
with XML documents, that is not yet the case. There are no XML browsers
on the market yet (although the latest version of IE does a pretty good job of incorporating XSL and XML documents provided HTML is the output).
Thus, XML documents must either be converted into HTML before distribution
or converting it to HTML on-the-fly by middleware. Barring translation,
developers must code their own processing applications.
The most common tactic used now is to write parsing routines in DHTML or Java, or
Server-Side perl to parse through an XML document, apply the formatting
rules specified by the style sheet, and "convert" it all to HTML.
"While it's true that browser support is limited, IE 5 and
Netscape 5 are expected to fully support XML. Also, W3C's Amaya browser
supports it today, as does the JUMBO browser that was created for
the Chemical Markup Language.
XML isn't about display -- it's about structure. This has implications
that make the browser question secondary. So the whole issue of what is to
be displayed and by what means is intentionally left to other
applications. You can target the same XML (with different XSL) for
different devices (standard web browser, palm pilot, printer, etc.).
You should not get the impression that XML is useless until browsers
support it. This is definitely not true -- we are using it at NASA in
ways where no browser plays any role." - Ken Sall
|
However, this takes some magic and the amount of work necessary
even to print "hello world" are sometimes enough to
dissuade developers from adopting the technology.
Nevertheless, parsing algorithms and tools continue to improve over time
as more and more people see the long-term benefits of migrating their
data to XML. The backend part of XML will continue to become simpler and
simpler. Already Internet Explorer and Netscape provide a decent amount
of built in XML parsing tools.
|
|
Essentially, style sheets are written instructions explaining how a certain
document should be displayed. Style sheets are as old as the printing
press and probably older.
Frank Boumphrey, in "Style Sheets for HTML and XML", puts style sheets
into their historical perspective as such....
"In the days of manual type-setting, style sheets were
nothing more than a set of written instructions from the publisher to
the printer telling the printer what kind of style to use when printing
up the publisher's manuscript. Traditionally, the editor would deliver
a "marked-up" manuscript full of terse notations like dele and stet.
The printer would then consult the style sheet of that oarticular
publishing house for a range of specifications. These specifications
would encompass such details as what size the pages were to be, what
size and family of font to use for chapter titles, sub-headings, body
text, and so on. Plus, how much leading to put between the lines, what
margins to leave, and whether, and by how much, to indent the
paragraphs."
As we have said above, HTML is a markup language that helps define how
the web browser should display a given HTML marked up document.
However we have also said that it is dangerous to embed too much style
into your HTMl code. What happens when the company decides to change
its corporate font? What about the company colors. If you have
hardcoded style throughout your website, it is very difficult to change.
This is where style sheets come into play. Style sheets allow you to
specify generic styles which apply broadly but are located uniquely. In
other words, a single style sheet can be referenced by every web
page on a site (or every component on a page if the style sheet
is defined in the page header).
If you want to change an aspect of style such as color,
font, spacing, or whatever, you simply change the style sheet and that
change is propogated to every page that references the style sheet.
There are several languages/specifications to help you create style
sheets for your site but the two most popular are CSS (Cascading Style Sheets)
and XSL (eXtensible Style Sheet Language) where CSS certianly has market
share.
CSS works using the <STYLE> tag and allows you to specify styles
using scripting languages like VBScript. Consider the following small
example:
<HTML>
<HEAD>
<TITLE>Test</TITLE>
<STYLE>
<!--
TD {FONT-FAMILY: "TimesRoman", "SANS-SERIF";}
BODY {FONT-FAMILY: "ARIAL", "SANS-SERIF";}
-->
</STYLE>
</HEAD>
<BODY>
This is in Arial
<TABLE>
<TR>
<TD>This is in TimesRoman</TD>
</TR>
</TABLE>
</BODY>
</HTML>
As you might imagine, BODY text will by displayed as ARIAL and Table
cell data will be displayed as Times Roman font.
Previous |
Table of Contents
|
|