Tips for Web (HTML) Authors
How we can help you:
- I want to be at the top of the search results!
How do I obtain a higher page rank? - I do not want my page to be searchable.
How do I prevent my page being indexed?
How do I prevent my links being followed by the crawler? - How do I prevent the Search Appliance caching my page or showing a snippet of my page in the search results?
- My page doesn't come up in the search results!
How do I add it to the index? - How do I create a link or Web form to search using the Search Appliance?
How you can help us:
Increasing page rank
Web pages with a high page rank will tend to appear first in the search results among pages that match the search terms. The Search Appliance determines page rank using many criteria. There is nothing we can do on the Search Appliance to alter a page's rank.
- How page rank is determined [google.com]
Preventing indexing and further crawling
Certain types of Web pages typically have little to no value in a search index. Examples include: "Document Not Found" (404) error pages, comment or reply forms for blog entries, and "printer versions" of a news articles. A comment form is generic and meant for data input rather than output to the reader, and the printer version of an article only duplicates content already indexed in the regular version.
You can use META tags in an HTML document to prevent the Search Appliance
adding the document to its search index
or to prevent the crawler following links from the document to other documents.
The META name attribute is "robots",
and the content attribute must contain "noindex", "nofollow", or both:
<meta name="robots" content="noindex" /> <meta name="robots" content="nofollow" /> <meta name="robots" content="noindex,nofollow" />
- Learn about directory or site-wide exclusion techniques
- Request removal of a URL from the Search Appliance document index
Suppressing page caching and snippet display
The image below is an example of an item from a search results list. The first line is the document title, the next two lines are a document snippet in the context of the search terms, and the last line, after the document URL, contains a "Cached" link that retrieves a cached copy of the document from the Search Appliance.
To disable the snippet display and the "Cached" link for an HTML page, place the following META tag in the HEAD of the document:
<meta name="googlebot" content="nosnippet" />
To prevent the document being cached in the Search Appliance while allowing a snippet in the search results, use this META tag instead:
<meta name="robots" content="noarchive" />
Submitting your page to the Search Appliance index
We can add your Web page to the Search Appliance index if it is not appearing in the search results. To test whether your Web page has been indexed by the Search Appliance, try searching for an exact phrase that appears on your page, enclosing it in quotation marks. Choose a phrase you think will be unique to your page. Look carefully through the list of search results for your page.
Sometimes you will see this message at the bottom of the last page of results: "In order to show you the most relevant results, we have omitted some entries very similar to the [number] already displayed. If you like, you can repeat the search with the omitted results included." Click "repeat the search" and look again for your page. If you find it after including omitted results, this means the Search Appliance has already indexed your page (do not submit its URL); however, it had determined that your page was similar to higher-ranked pages also in the index. See the section on page rank.
If you need to add your entire Web site, please do not submit a URL for every page on your site. Typically, you will only need to submit the URL of your home page. The Search Appliance will use its crawler to start at your submitted URL and follow successive links into your site, indexing pages as it goes.
Adding a search form or link to your site
To allow your Web visitors to search all University Web content, you may provide a link to search.umn.edu. This is the "Search U of M" link that appears in the top banner of University Web sites using the standard Web Depot templates.
To search within a single Web site, or part of a site,
you do not need to apply for Search Appliance service.
Simply use the HTML code below.
Note that you may omit the sitesearch input element
for a University-wide search.
<form action="http://google.umn.edu/search" method="GET"> <input type="text" name="q" maxlength="256" /> <input type="hidden" name="site" value="default_collection" /> <input type="hidden" name="client" value="searchumn_generic" /> <input type="hidden" name="proxystylesheet" value="searchumn_generic" /> <input type="hidden" name="output" value="xml_no_dtd" /> <input type="hidden" name="sitesearch" value="website_url_base" /> <input type="submit" value="search button text" /> </form>
client=searchumn_generic- The "client" parameter can be set to
searchumnif you would like context-sensitive University links to appear above the search results. For example, a search for "technology" might display a link to the Institute of Technology. sitesearch=website_url_base- the starting portion of the URL that is common to all pages of your site.
For example, this Web site uses
www.umn.edu/google.
Do not include thehttp://part. proxystylesheet=searchumn_generic- If you want the search results page to look exactly like
that of "search.umn.edu", set this parameter to
searchumn. - search button text
- the text to appear on the form's submit button.
- other search parameters
- The Google Search Protocol Reference describes these and other search form parameters in more detail.
If you are a search manager in charge of a Search Appliance front end and collection, please see our search manager guides.
- Web Depot: University of Minnesota Web site templates
- Apply for a custom search page
Pass session data through cookies, not the URL
The Search Appliance determines that two documents are unique if they have different URLs. If you have worked with Web applications or CGI scripts, you know that a single "page" can have an indefinite number of URLs, simply by adding arbitrary characters after a '?' in the URL (the query string). Web application frameworks in PHP, ASP, and ColdFusion, to name a few, offer mechanisms to pass user session tokens in the URL when requesting a page. Since these tokens are random and frequently changing, the Search Appliance will recrawl such session-enabled pages indefinitely, assuming each URL represents a unique page. This overinflates our limited search index, and it incurs an unnecessary load on both your Web server and the Search Appliance.
The solution is to use cookies instead of URL tokens to pass session data. This is often an option in the Web application settings.
| Web application framework | Token name |
|---|---|
| PHP | PHPSESSIONID |
| ASP | ASPSESSIONID |
| ColdFusion | CFID CFTOKEN |