You are on page 1of 2

Big data processing projects are to transform streams of unstructured/semi structured data

such as data from social media sites or a system generated log files into structured files to

create a database/collections to query. Such structured files could be tables in RDBMS,

Key Value Stores, JSON files, CSV(Comma Separated Value) files for Mongo DB,

Cassandra, Volt DB or a Document Collection in Hbase in HDFS.

1. LinkedIn data transformation into either one of the platforms:

1-1) Tables in RDBMS (MS SQL Server using LINQ or any database server with Java/JDBC) or VoltDB

1-2) CS (Comma Separated) Files or any structured Files (Key Value Store, JSON) in Hadoop to query with
any tools such as Flume, Hive, Pig Latin Hbase, Mongo DB and more.

Avatara: OLAP for Webscale Analytics Products

Lili Wu Roshan Sumbaly Chris Riccomini Gordon Koo Hyung Jin Kim Jay Kreps Sam Shah LinkedIn The “Big
Data” Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

2. Facebook Timeline message data transformation into either one of the platforms:

2-1) Tables in RDBMS (MS SQL Server using LINQ or any database server with Java/JDBC) or VoltDB.

2-2) Key Value Stores, JSON, CSV(comma separated value) files or any structured Files in Hadoop to
query with any tools applicable such as Flume, Hive, Pig Latin, Hbase, Mongo DB, Cassandra and more.

Related papers to read: will be given Petabyte Scale Databases and Storage Systems Deployed at
Facebook. Dhruba Borthakur

3. Facebook Friends Social Network (Graph API) data transformation into either one of the platforms:
CIS 612 Sunnie S Chung Cleveland State University

3-1) Tables in RDBMS (MS SQL Server using LINQ or any database server with Java/JDBC), or VoltDB

3-2) Key Value Stores, JSON, CSV(comma separated value) files or any structured Files in Hadoop to
query with any tools applicable such as

Flume, Hive, Pig Latin, Hbase, Mongo DB, Cassandra and more.

Data Warehousing and Analytics Infrastructure at Facebook, in SIGMOD 2010 by Ashish Thusoo
(Facebook), et al,

http://hive.apache.org/

Muppet: MapReduceStyle Processing of Fast Data

Wang Lam1, Lu Liu1, STS Prasad1, Anand Rajaraman1, Zoheb Vacheri1, AnHai Doan1,2 WalmartLabs,
University of Wisconsin Madison
4. Twitter Message data transformation into either one of the platforms:

4-1) Tables in RDBMS (MS SQL Server using LINQ or any database server with Java/JDBC), or VoltDB.

4-2) Key Value Stores, JSON, CSV(comma separated value) files or any structured Files in Hadoop to
query with any tools applicable such as Flume, Hive, Pig Latin, Hbase, Mongo DB, Cassandra and more.

Related papers to read: will be given The Unified Logging Infrastructure for Data Analytics at Twitter
George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy Twitter, Inc.

Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture Gilad
Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Twitter, Inc.

5. Transform log files in any system into either one of the platforms:

5-1) Tables in RDBMS (MS SQL Server using LINQ or any database server with Java/JDBC), or VoltDB CIS
612 Sunnie S Chung Cleveland State University

5-2) Key Value Stores, JSON, CSV(comma separated value) files or any structured Files in Hadoop to
query with any tools applicable such as Flume, Hive, Pig Latin, Hbase, Mongo DB, Cassandra and more.

6. Transform any electronic books on line into

6-1) Tables in RDBMS (MS SQL Server using LINQ or any database server with Java/JDBC), ) or VoltDB.

6-2) Key Value Stores, JSON, CSV(comma separated value) files or any structured Files in Hadoop to
query with any tools applicable such as Flume, Hive, Pig Latin, Hbase, Mongo DB, Cassandra and more.

You might also like