Professional Documents
Culture Documents
V.Deepthi 1210A048
Assembled lists of files available on many FTP servers. Allowed regex search of these file names.
In 1993, Veronica and Jughead were developed to search names of text files available through Gopher servers.
In 1994, Stanford grad students David Filo and Jerry Yang started manually collecting popular web sites into a topical hierarchy called Yahoo.
In late 1995, DEC developed Altavista. Used a large farm of Alpha machines to quickly process large numbers of queries. Supported boolean operators, phrases, and reverse pointer queries.
In 1998, Larry Page and Sergey Brin, Ph.D. students at Stanford, started Google. Main advance is use of link analysis to rank results partially based on authority.
Web Challenges
Distributed Data: Documents spread over millions of different web servers. Volatile Data: Many documents change or disappear rapidly (e.g. dead links). Large Volume: Billions of separate documents. Unstructured and Redundant Data: No uniform structure, HTML errors, up to 30% (near) duplicate documents. Quality of Data: No editorial control, false information, poor quality writing, typos, etc. Heterogeneous Data: Multiple media types (images, video, VRML), languages, character sets, etc.
Yahoo approach of using human editors to assemble a large hierarchically structured directory of web pages.
http://www.yahoo.com/
Open Directory Project is a similar approach based on the distributed labor of volunteer editors (net-citizens provide the collective brain). Used by most other search engines. Started by Netscape.
http://www.dmoz.org/
Advertisers pay for banner ads on the site that do not depend on a users query.
CPM: Cost Per Mille (thousand impressions). Pay for each ad display. CPC: Cost Per Click. Pay only when user clicks on ad. CTR: Click Through Rate. Fraction of ad impressions that result in clicks throughs. CPC = CPM / (CTR * 1000) CPA: Cost Per Action (Acquisition). Pay only when user actually makes a purchase on target site.
Advertisers bid for keywords. Ads for highest bidders displayed when user query contains a purchased keyword.
PPC: Pay Per Click. CPC for bid word ads (e.g. Google AdWords).
Document corpus
Query String
IR System
Ranked Documents
Interesting problems
How to pick the top 10 results for a search from 2,230,000 matching pages? What ads to show for a search? If Im an advertiser, which search terms should I bid on and how much to bid?
User
Web
Miele, Inc -- Anything else is a compromise
At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages
Web crawler
Miele
Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages
Search
Indexer
Web advertising is the action of promoting your website using online advertising tools, techniques and methods proven to get the results you are looking for. It is used simultaneously as online advertising.
Online advertising is basically the action of actively promoting your new business.
The signposting should give a concise and accurate idea of what they can expect to find when they get there with that precious click. What happens after that, is another matter. -Zsolt Kerekes, is the editor of STORAGEsearch
A Tidbit on Pop-Ups
Pop-ups are the single biggest annoyance on the Internet Yet pop-up advertising is growing faster than any other form of online advertising. ``Any survey we've seen shows that users dislike pop-ups more than almost any other ad format,'' said David Hallerman, senior analyst at marketing-research firm eMarketer. ``[But] we see online advertising growing 25% this year, and [ad ware] surpassing it by 10%.'
Top sites for pop-up/pop-under ads for May 2004
CNN ESPN.com MSN Yahoo!
Pay-For-Placement (PFP)
As long as you bid the top two or three positions, you are guaranteed to be displayed in the top of the results for the search engine and its partners
Pay-For-Inclusion (PFI)
A search engine includes your website pages in its index in exchange for payment, generally six months to one year. This does not mean your page will appear in the top position
Google Adwords
Keywords you pick for your site are matched against those products or services people have expressed an active desire to get information on
Paid search results are the hottest business on the Web, so it's little surprise the two titans of search are colliding
Google's revenues were $390 million in the first quarter, up 118% from a year ago Yahoo moved into the business forcefully when it acquired a paid search company called Overture last year
The hottest spots include the home pages of the Big Three: Yahoo, MSN, American Online
Marketers generally buy the home-page ad for 24-hour periods Space on these sites they may have to be booked up to a year in advance
The Effects of Phising and Spoofing on Web Advertising Phishing and spoofing occur when scammers dupe Web
adopt technology that certifies legitimate mail incorporate toolbars that warn users that they may be entering shady parts of the Internet
Auction site eBay (EBAY) has one that stays green when users are on eBay, goes gray when they leave the site, and sends out a pop-up message when they stumble onto a known spoof site
users into divulging account and other personal information by pretending to represent known brands How can a marketer deal with phishy e-mail and spoofing scamsters?
use software that can help companies react when targeted by tainted mail, blunting the damage to customers. Check with your Internet service providers
Some are developing so-called "black lists" that block e-mail from known spammers. In the future, these could be turned into "white lists," so that only e-mail that has been verified from legitimate sources makes it through
Marketers should never ask for personal information nor link to a page that asks for personal data For now, the best defense for marketers is strong and consistent branding, so customers can tell the difference between a real e-mail and a phishing attack
Don't trust e-mail headers, which can be forged easily Avoid filling out forms in e-mail messages. You can't know with certainty where the data will be sent and the information can make several stops on the way to the recipient Try not to click on links in an e-mail message from a company. Too many scam artists are making forgeries of company's sites that look like the real thing If you go to a link offered in an unsolicited e-mail, check to see if there is an 's' after the http in the address and a lock at the bottom of the screen. Both are indicators that the site is secure If you want to do business online, don't click on an e-mail link. Go to the company's Web site yourself and fill out information there Review credit card and bank account statements as soon as you receive them to determine whether there are any unauthorized charges. If your statement is late by more than a couple of days, call your credit card company or bank to confirm your billing address and account balances
The fraud can be perpetrated very quickly, and afterward, the perpetrator can "vanish" into cyberspace The phony websites typically migrate from one server to another very rapidly -- in an effort to stay a step ahead of ISPs and law enforcement The average phishing web site is online for only about 54 hours, according to June data from the APWG. Some sites, however, have been able to remain online for more than two weeks before being shut down or abandoned Existing federal laws do criminalize phishing -- but mainly after the damage is done, when a consumer has already been defrauded as a result of the phishing. Those measures include the laws against wire fraud, identity theft, credit card fraud, computer fraud, and a number of trade laws -- and may even encompass the new federal CAN SPAM Act Many phishers appear to send their e-mails from overseas, and it may be difficult to prosecute persons who reside offshore
75% of the U.S. population now has Internet access at home, according to NetRatings 29% of U.S. homes have a broadband connection, says eMarketer
http://www.webattack.com/Adwarepop .html