Internet Quotes Assistant

 

Quotation Server Settings
(Last updated on March 2007)

 

 

Automatic Quotation Download/Update

 

Internet Quotes Assistant downloads web pages from the internet like an ordinary web browser. After being downloaded, the web pages might be converted to a simplified format, before the quotations finally get extracted from them. The quotations are then used to update the portfolio.

 

Internet Quotes Assistant is distributed with some pre-configured quotation servers, but you can also specify your own quotation servers.

 

In order to automatically update quotations available on the internet, you must know:

 

i)                    The URL of the web page, where the quotation can be found,

ii)                  The exact position of the quotation on that page.

 

The following steps are required to configure a new quotation server:

 

1.       Find a web page that offers quotations on a regular basis (e.g. http://www.abc.com/quotes.html)

2.       Define the position of the desired quotation on the page

3.       Test the automatic quotation update

 

IQA’s Server Configuration Wizard will help you to setup your quotation server configuration. However, you might also configure the server settings manually. The rest of this document will give you more information on how to manually set IQA server settings.

 

Notes:

 

 

 

1. Finding a quotation server

 

You can use one of the popular search engines available in the internet (e.g. Google, Yahoo, etc.) to find a web site providing the quotation you are interested in.

 

When you’ve found a page containing the quotations you are interested in, you can copy the URL address from the browser window and paste it directly in the URL field of IQA’s Server Properties window.

 

Some details:

·              If the quotation is placed on a web page that uses frames, you will have to find out the exact page address. In this case you can press the right mouse button in the browser window and select the properties option of the context menu to get the correct URL address of the desired frame.

·              If the quotation symbol is part of the URL, e.g. http://www.abc.com/quote?symbols=MSFT, you can use the variable [&symbol] instead of "MSFT" in the URL field, so that the server configuration can be used as a template for other quotations. The variable [&symbol] will be replaced by the item’s symbol (e.g. "MSFT") when the page is to be downloaded.

·              If the web page address changes regularly (for example, because of the date, like in http://www.abc.com/quotation0715.html), you can use the variables [&d], [&dw], [&m], [&mstr], [&y]  (day; day of the week: mo=1, tu=2, ...; month; month string: jan, feb, ...; year) in the URL field, in order to let IQA always load the most actual page. For the example above you would have the URL: http://www.abc.com/quotation[&d][&m].html.

 

 

 

2. Definition of the position of the desired quotation on the page

 

The easiest way to define the position of the desired quote on a web page is to use IQA’s Server Configuration Wizard. However, if you want, you can also do it manually. Depending on the structure of the web page this task might require some effort. You should, first of all, take a look at some examples that are distributed with IQA.

 

IQA offers the possibility to define basic and advanced data extraction rules.

 

 

2A) Basic Quotation Extraction Rules

Most web pages are written in Hyper Text Markup Language (HTML), that means, they have not only text, but also format tags, such as <B> and </B>, which are used to format the text’s layout.

 

Usually a quotation is found after a specific text, that appears only once on the page. This text should be entered in the sever configuration dialog in the field Quotation after the following text. For instance, "XYZ Inc.  Last quotation:" (this field may also contain the IQA variables [&d], [&dw], [&m], [&mstr], [&y], [&name], [&symbol]). Please note, that often the text that comes before a quotation contains HTML tags that might not be seen directly. Usually, you should, first of all, take a look at the source code of the web page (Internet Explorer: View menu, Source option; Firefox: View, Page Source option). If you are dealing with a simple web page, you can let it be converted to text format, so that you won’t need to pay attention to HTML tags anymore).

 

Let’s take a look at the 3 different page conversion options, from which you can choose one:

·                    None
The page will be not converted at all, it will appear just like you see when you choose the option Source from menu View of Internet Explorer.

·                    Semi-HTML (only empty HTML tags "<>"):
All HTML tags are replaced by "<>"s.
(for ex. "Amazon Inc. <B><font size="1">197.02</font></B>"
results in  "Amazon Inc. <><>197.02<><>")
This way you can specify the quantity of HTML tags that appear between the defined text and the quotation. This can be useful when the quotation is placed in a big table and you don’t want to specify the contents of the HTML tags that appear before it. More over it’s possible to simply specify the number of HTML tags that should be skipped (after the specified text) to get the quotation, ignoring what is between or before the HTML tags.

 

·                    Text-only:
All HTML tags are removed.
(for ex.: "Amazon Inc. <B><font size="1">197.02</font></B>"
results in "Amazon Inc. 197.02")
It’s very useful, when you have to do with a simple page that is structured like a table with two columns (Stock id, Quotation).

When the option Semi-HTML is chosen it’s possible to count the number of HTML tags that appear between the defined text and the searched quotation.

 

In general, it’s possible to define the quantity of numbers, that appear after the text (and eventually after the HTML tags) but before the searched quotation. This is useful, if different quotations are given in the same line, and we are just interested in one of them. For instance, for

 

<TABLE WIDTH="580" BORDER="0" CELLSPACING="0" CELLPADDING="0">

<TR> <TD>Stock</TD> <TD>Difference in %</TD> <TD>Last quotation</TD> </TR>

<TR> <TD>Amazon Inc.</TD> <TD>3.5</TD> <TD>192.02</TD> </TR>

 

In this case, the following IQA server settings could be used:

 

I)

Conversion: Text-only

Quotation after the following text: Amazon Inc.

and [ 0 ] HTML tags and [ 1 ] Number(s)

 

Converted text:

 Stock Difference in % Last quotation

 Amazon Inc. 3.5 192.02

 

II)

Conversion: Semi-HTML

Quotation after the following text: Amazon Inc.

and [4 ] HTML tags and [ 0 ] Number(s)

 

Converted text:

<> 

<> <>Stock<> <>Difference in %<> <>Last quotation<> <>

<> <>Amazon Inc.<> <>3.5<> <>192.02<> <>

 

 

 

2B) Advanced Quotation Extraction Rules

Advanced quotation extraction rules allow you to extract more detailed quotation information (quotation date/time, open, close, high, low and volume) by specifying regular expressions.

 

For more information on regular expressions please refer to the following links:
http://perldoc.perl.org/perlre.html#Regular-Expressions, http://www.cc.gatech.edu/classes/RWL/Projects/citation/Docs/Design/regex.intro.1.doc.html, http://www.emerson.emory.edu/services/editors/ne/Regular_Expressions.html.

 

A very good graphical tool for testing regular expressions is The RegExp Coach provided by Dr. Edmund Weitz. The RegExp Coach is free for private use. It is very useful for learning how regular expressions work.

 

In an advanced quotation extraction rule, you can specify the page conversion option (None, Semi-HTML and Text-only) and the numbering format (decimal and thousands separator) exactly as you do in a basic quotation extraction rule. The difference lies only in how the quotation position in the web page is specified.

 

Regular expressions provide a powerful way for searching/extracting complex data from a document. Regular expressions allow you to specify the exact format of the data to be extracted. The format definition might include spaces and separators such as commas, periods, colons, quotes, tabs ("\t"), carriage returns (“\r”), new lines (“\n”), etc.

 

 

IQA provides the following data extraction placeholders:

 

These data extraction placeholders give you some help in the specification of the regular expression to be matched. The data extraction placeholders are internally converted by IQA to regular expressions, e.g. [&quotation_close] = ([0-9,.]+). Note that only the placeholder [&quotation_close] is mandatory. If the other placeholders are left unused, the current date/time will be set and the further quotation values will be set to 0.

 

Remarks:

 

 

Here are some sample advanced quotation extraction rules:

 

 

 

 

3. Test quotation download/update

 

In the Server Configuration dialog you can press the button “Test Configuration…” to test a new quotation extraction rule.

 

In the test phase also make sure that the option "Open downloaded file in editor when quotation is not found" (menu Extras  –  Options…) is activated, so that you can take a look at the converted file. If the quotation is not found, it will be then easier to figure out how to adapt the server settings.

 

                                                                                                                         .

 

 

If you have any question, you may contact the author via e-mail:

Marcos Rocha (mailto:IQA@gmx.net)