You are on page 1of 4

Submitted by Keerthana Subramani ASU ID: 1203845157

Source: Huffington Post News website that has news articles and user comments. Data type: Text data Structure of pages: HTML Seed URL and search query is specified Crawl URLs using depth first approach and store in database. k, k+1th , k+2 and so on..

Eyeballing Identify anchors Access source using the URL Run RE over HTML data Get title, timestamp, tags and article data and store to database. User comments are also extracted. Identify by string matching

Query dataset using timestamp. Creating a timeline of events from the timestamp of each data Software used: Timeline creator Shows the date and title, occurrence of events over time

You might also like