You are on page 1of 6

26th IEEEP Students Seminar 2011 Pakistan Navy Engineering College National University of Sciences & Technology

Semantic Based Multimedia Analysis and Retrieval


Sana Aslam, Mahwash Makhdoom, Madeeha Khan, Amna Basharat FAST National University of Computer & Emerging Sciences Department of Computer Sciences AK Brohi Road, H-11/4 Islamabad, Pakistan sanaaslam_01@yahoo.com, mahwash08@gmail.com, madeehatanveer@gmail.com, amna.basharat@nu.edu.pk
AbstractThe volume of multimedia files is increasing day by day. Especially with the advancement in religious e-learning and multimedia knowledge resources, it has become highly demanding that effective retrieval and search methodologies should be incorporated in this area. Many renowned scholars from all over the world deliver lectures in which they discuss diverse issues/topics and these are usually hours long; therefore it is problematic and time consuming to navigate to a particular segment within the multimedia files manually. The necessity of present time is that efficient search methods should be devised which facilities the user to select the topic of concern within the multimedia files with just a single click. Moreover, users should be assisted in a way that allows them to query in natural language instead of keyword based search so that the retrieval is improved and results are based upon content and context. In this paper we are proposing a method that enables the users to navigate to a particular segment within the multimedia files. The architecture of Semantic based Multimedia Analysis and Retrieval (SMART) allows content indexing and time stamped alignment of transcriptions with multimedia file. The architecture of SMART incorporates the natural language processing techniques for efficient query and the modern semantic web technologies for efficient search. Search will be handled by modeling the knowledge base with ontologies. The advantage of making ontologies is that it allows machine interoperability; further this would help to retrieve the relevant results. The architecture envisions combining natural language processing techniques along with modern semantic web technologies and use them in a domain that opens new ways of knowledge sharing and information retrieval.

further presently the search is performed on the basis of text associated with the media files. The major issue in the current process is that the search results have high recall but low precision [1], but contemporary users demand efficient and precise information. Usually Multimedia data is very detailed and lengthy and often different topics are discussed in a single file, so it becomes tedious and tiresome when it comes for user to search a particular topic or finding answer to particular information from the file manually. With the advancement in this field, there has been an urge in users to retrieve the most relevant, accurate and precise data when they are querying. Keyword based search provide hundreds and thousands of hits but it becomes frustrating and tiresome for users to search the relevant content. In order to overcome the problem, there is a need to model the current searching techniques that enable the users to not only retrieve the most accurate results by querying in natural language but also provide them with the exact content that matches their search criteria. There are a number of search engines presently that are working to incorporate semantics in their architecture e.g. Hakia [2] is a semantic based image retrieval search engine, True Knowledge [3] is another that enables users to query in natural language and then returns the precise answer to that, still a significant effort needs to be done to incorporate semantics for efficient retrieval of multimedia data resources especially in the domain of Islamic scholarly lectures. The architecture of SMART uses ELAN tool for alignment of transcriptions with multimedia file and then uses the modern semantic based technologies to annotate that data efficiently so that relevant data retrieval is achieved [4]. Studies have shown that use of semantics can empower the capabilities of e-learning [7, 8] and can support knowledge virtualization. In SMART we have transcribed the media file and then time aligned the media file with the corresponding text. A metadata is attached with the media file that contains information about the file. A knowledge base is attached with the system

Keywords-component: Natural Language Processing, Transcription Alignment, Islamic Scholarly Lectures, Multimedia Segment Navigation, Ontology

I.

INTRODUCTION

With the changes in needs of users new trends have been introduced and developed to store information. The advancement in the field of e-learning has transformed multimedia resources as a very valuable source of knowledge and information. The search engines nowadays, do not enable users to search a particular segment within the media file,

26th IEEEP Students Seminar 2011 Pakistan Navy Engineering College National University of Sciences & Technology The tags are generated with the help of the transcribed file and the knowledge base. The search is performed with the help of the tags and the media file is returned to the user which is navigated to the segment which the user demanded. In order to test our approach, we have chosen the Islamic scholarly lectures as our target domain thus the domain concepts represented in the ontologies is concerning the views of different Islamic scholars. The main aim of SMART is to open new ways of incorporating technology in Islamic world and to bridge the gap between these two to help people in understanding the concepts within the religion with ease. Structure of this paper: Section 2 describes the motivation behind the project and challenges associated with working in this domain. Section 3 describes the design goals, detailed architecture of SMART and implementation. Section 4 discusses the implementation details. Conclusion and future work is further discussed in Section 5. II. MOTIVATION BEHIND WORKING IN THIS DOMAIN AND CHALLENGES A. Motivation The motivating factors behind carrying out this project are: To enable users to retrieve relevant information from the multimedia files. We would achieve this by pruning the irrelevant results. The search results provided would be fewer but would be more precise and relevant. One of the main targets is to enable users to get the view of different scholars on diverse topics in a less amount of time by reducing the query time and also facilitate the users by enabling them to query in natural language Previously keyword based search has been performed upon text resources and multimedia lectures. Semantic based search techniques have not been performed on multimedia files till now. So we hope that SMART would open new ways of efficient and meaningful search in this area B. Challenges Associated The challenges associated with this project are described as follows One of the basic tasks is to convert the multimedia file into text. One way to achieve this is through speech recognition, but when tested, the results provided through speech recognition were not up to the mark as the domain contained words which are not frequently spoken in English language. Also the videos contained many words of Arabic language. So the accuracy achieved through speech recognition was between 35-45 % which was quite low to work with, as the search process was dependent on the text associated with the media file. Another challenge associated with working in this domain is that no previous work has been done in this area and more over due to the complexity resulting from many diverse views of different scholars on the same topic. So to create a link between them and provide the user with sound results is a challenging task in this domain. III. SMART DESIGN GOALS AND SMART ARCHITECTURE A. Design Goals The existing search engines retrieve the multimedia content based upon the tags associated with the file such as its title, name of speaker etc. To facilitate the user to efficiently navigate to particular segment of interest is a challenging and demanding task nowadays. Many researches show that there has not been done a significant amount of work in this prospect. This research claims to propose an architecture that is capable of facilitating the user to navigate to a particular segment within the media file and that too by allowing the user to query in natural language. The purpose behind facilitating the user to query in natural language is that it will target and return more precise content the user wants to search and will prune the results that are of no use for user. Further elaborations on the goals have been provided below that provide the basis on which the architecture of SMART is formulated 1) Processing of Textual Content of Multimedia Resources: SMART should be able to process and align the textual content i.e. transcriptions associated with multimedia file efficiently so that the acquisition of timestamps associated with text of a multimedia file is achieved. Time stamped information will help to create link between the text of the file and multimedia content. 2) Automatic Tagging of New Multimedia Files Added in Repository: The knowledge base comprises of the most commonly occurring terms in this domain, whenever a new file is added in the repository, SMART should be able to automatically tag the segments of file that contains those domain terms and should save their timestamps. 3) Natural Language Processing

This research is envisioned to provide the user with facility to query in natural language. It will allow the user to do query as sentences in English language. SMART should be able to process the query and apply Natural Language Processing techniques [9] on query structure so that extraction of the meaning out of query and its precise association with text of the media is ensured. 4) Ontological Knowledge Model:

To enable efficient search, the need of hour is to model data to knowledge so that ontological model of knowledge can be designed for this particular domain of Islamic Scholarly

26th IEEEP Students Seminar 2011 Pakistan Navy Engineering College National University of Sciences & Technology Lectures. This ontological model would plot the information such as speaker, topic, etc. 5) Intelligent Retrieval of Information: way that it incorporates in it the data relevant to the data i.e. metadata of the data. In the tag generation unit the tags are generated on the parsed data to initiate the navigation process. Knowledge Modeling builds respective ontology models for religious scholarly texts and with use of the ontology schemas. Ontology repository stores the Ontologies. The ontology repository and the metadata repository form the knowledge base for the system which will be used for efficient search retrieval purpose. The incorporation of semantic web technologies is to speed up the search process and reduce the response time of the overall process. In the final phase of Query-Result Accuracy Analysis, the compatibility analysis of query and the extracted result and its accuracy is determined using the natural language techniques and by analysis of the thematic coherence between query and the results. Finally the results are returned back to the user and displayed through user interface of SMART application. The results will comprise of the list of different speakers and files associated with them that contain accurate timestamps of occurrence of the answer of user query within the multimedia stream.

SMART should be able to retrieve efficient and meaningful results by pruning the irrelevant hits and only providing the user with most precise results. B. High Level System Architecture of SMART

The high level system architecture of SMART is shown in Figure 1 and it comprises of five major activities: Transcription Alignment, Query Processing, Knowledge Extraction, Knowledge Modeling, and Query-Result Accuracy analysis. In the first phase, the multimedia resources are aligned with the transcriptions that would be treated as the repository of SMART. In the Query Processing unit applies Natural Language Processing Techniques on the user Query and extract its meaning so that it can be mapped with accuracy in the transcribed data. The Knowledge Extraction unit comprises of metadata generation and tags generation in which the transcribed aligned data is parsed in a

Figure 1: High level System Architecture for Multimedia Segment Navigation The detailed architecture of SMART on the basis of which the design and implementation details are formulated is discussed in following subsections: 1) Transcription Alignment Unit: As discussed above, the challenge associated with SMART is to convert the audio into text. The complexity lies with the fact that the scholarly lectures in English language, contains Arabic terms and some Urdu terms, so accuracy cannot be achieved and risk factor cannot be neglected in such a sensitive domain of religious lectures. For this reason manual transcriptions are generated for each of the multimedia file. The transcriptions are then aligned with the multimedia stream using ELAN tool that is an open source tool used for aligning transcriptions with multimedia. The importance of this unit lies with the fact that the accurate the alignment, the more efficient would be the search process. ELAN aligns transcriptions along with timestamps which are required for segment navigation [5]. 2) Knowledge Extraction Unit:

26th IEEEP Students Seminar 2011 Pakistan Navy Engineering College National University of Sciences & Technology This unit consists of two components; these are Metadata Generation and Tag generation. A brief detail of both the components is as follows: 3) Metadata Generation: This component is responsible for generating metadata of the multimedia files. The metadata holds information of the file by storing the name of file, its topic, its location [6]. 4) Tag Generation Tag generation unit is responsible for generating tags in the multimedia file by identifying useful tags with the help of knowledge base. 5) Query Processing Unit: The implementation of SMART architecture is divided into two modules. The navigation strategy completion and second is the NLP techniques with semantic incorporation. Till now we have implemented the first module i.e. the implementation of segment navigation within multimedia stream based on keywords. The details of subsections of this module are elaborated as follows: A. Navigation Strategy It comprises of Transcription generation, alignment with multimedia stream and facilitating keyword based navigation of multimedia stream. The three subsections are discussed in detail as follows: 1) Transcription Generation

The user query would be passed on to the Query Processing Engine, which would extract useful keywords from the user query. Here, Natural Language Processing techniques would be applied on the user query to understand the syntax and semantics of the user Query. 6) Knowledge Base:

The knowledge base would contain the most frequently used words in the Islamic domain. With the help of these words the tags for the particular video would be generated. 7) Knowledge Modeling Unit

In this unit, the data will be transformed into the form of knowledge models with the help of metadata generated with the use of ontologies. The ontological model of data will be stored in this unit that would comprise of schemas to incorporate semantics in the data for efficient search and retrieval purpose. The user query from the Query Processing Unit will be mapped onto the data for acquiring the exact location holding the answer to that query. For efficient retrieval the data has already been shaped in the form of ontologies so it would facilitate to speed up the overall process of knowledge extraction and acquisition. The use of ontologies facilitates machine interoperability and conceptualizes the domain in a format that is understood [10] by the machine. 8) Query-Result Compatibility Analysis Unit

Before discussing in detail the first part of transcription generation, it is important to understand the reason behind using transcriptions when there are many speech recognition engines available these days. This is due to the fact that the domain we are targeting holds in it concepts of Arabic and some Urdu terms even if the whole lecture is in English language. This raises the challenge of recognizing multilingual stream of data file, which to date is not achievable and efficient. Another issue is that the speech recognition engines available today are workable with applying machine learning techniques on them, this approach works well if there is a single speaker because the machine has to be trained on it. Moreover due to diversity in the dialects and pronunciation of various speakers, it is very difficult to recognize the spoken words with accuracy [11]. This domain is so sensitive that different views are required by users to understand the concepts within the religion. Also this would constrict the domain to a single speaker that would not benefit the users who want to know views of different scholars on a same topic. So to deal with above mentioned issues, we have transcribed the multimedia files. 2) Alignment with Multimedia Stream:

In this unit the query and its corresponding mapping obtained in the data would be verified. This would be necessary because the domain under consideration is very sensitive and there is a risk involved in returning the results to the user without its proper validation and verification. From this unit, the verified results would be returned to the user-interface for SMART application.

Navigation within the stream is possible if we get the time stamped information of the topics discussed in the multimedia file. For this there is a need of aligning transcriptions with the multimedia stream. We have used ELAN tool for this purpose. ELAN (the Eudico Linguistic Annotator) is a program that allows aligning the transcriptions and adding annotations to a video file. ELAN aligns the transcriptions with the media file and returns the time stamped data i.e. the words spoken in the video along with the time at which they were spoken. [5] 3) Keyword Based Navigation:

IV. IMPLEMENTATION

In this unit of the architecture, keywords based search is implemented. In this we will discuss in detail the working of Knowledge Extraction unit of SMART architecture. The knowledge extraction unit consists of two subunits. One is

26th IEEEP Students Seminar 2011 Pakistan Navy Engineering College National University of Sciences & Technology the tag generation unit and the other is to attach the metadata with the corresponding media file. The tag generation unit gets the input in the form of the time aligned data file. The tags are generated with the help of Knowledge base. The knowledge base consists of most commonly used words in the Islamic domain so that tags could be added to in relevance to the multimedia files. The tagged repository is maintained which consists of metadata. The metadata is comprised of the keyword information and the corresponding media file in which it is occurring. The metadata associated with the transcription consists of detailed information regarding keywords contained within knowledge base, their corresponding timestamps and path of multimedia file containing those keywords. Figure 2 shows the detailed working of tag generation. The workflow of the components on the basis of the algorithm is showed in Figure 3

Figure 3: Detailed Architecture of the Search Engine B. Incorporation of Semantics

The second module of implementation of SMART includes incorporation of semantics in the architecture for efficient search and using NLP techniques for query processing. Figure 2: Detailed Architecture of the Tag Generation Unit With the acquisition of tagged information, it is now possible to navigate to a particular segment within the multimedia file. The detailed algorithm of the search process implemented is discussed below.
ALGORITHM 1: NAVIGATION WITHIN MULTIMEDIA STREAM

V.

CONCLUSIONS & FUTURE WORK

In this paper we have proposed a potentially powerful and novel approach for the retrieval of multimedia information. The crux of our innovation is the development of a procedure through which we can retrieve a particular segment from a multimedia file. We have used a domain of Islamic Scholarly lectures for project demonstration but our results can be generalized and can be applied on other domains as well. Moreover, speech recognition does not prove to be a vital approach for working in this domain due to the variety of words spoken in different languages within a single lecture. That is why going with transcriptions is necessary for an effective search. Combined with modern semantic technologies, we are hopeful that SMART, in comparison with other semantic based search engines would prove to be an efficient and effective search engine for multimedia files. Although we are confident that the conceptual framework for this project is sound, and its implementation is completely feasible from a technical standpoint, but still some other important aspects are needed to be covered in future. These include adding semantics to achieve efficiency and effectiveness while retrieving the results. Moreover, until now we have been working with videos in English. Later on we would like to incorporate videos in

1. 2.

Initialize UserQuery to U Initialize SelectedSpeaker to S Input the user query 3. if the user selects the speaker 4. Store speaker name in Temp 5. Retrieve the names of multimedia files of the corresponding speaker from the meta-data file 6. Search the database for the USERQUERY WHERE multimedia file name belongs to retrieved list 7. Retrieve the results, their corresponding timestamps and multimedia file names 8. else 9. Search the database for the USERQUERY 10. Retrieve the results, their corresponding timestamps and multimedia file names 11. Display the retrieved results to the users 12. User selects the multimedia file and plays it

26th IEEEP Students Seminar 2011 Pakistan Navy Engineering College National University of Sciences & Technology Urdu language as well. The need of this lies with the fact that the domain has a vast amount of data in Urdu language that could be used a valuable resource of knowledge and information. In future we would also work on user studies and evaluation.

REFERENCES
[1] Latifur, Dennis August 2000 Effective Retrieval of Audio Information from Annotated Text Using Ontologies1, ACM SIGKDD Workshop on Multimedia Data Mining, Boston, MA http://www.hakia.com [Accessed 15 September 2010] http://www.trueknowledge.com/[Accessed 28 October 2010] Y. Xiao, M. Xiao, and F. Zhang, Agents-Based Intelligent Retrieval Framework for the Semantic Web, in Proc. WiCom, 2007, pp. 5357-5360. http://www.lat-mpi.eu [Accessed 15 November 2010] R. Guenther and J. Radebaugh: Understanding Metadata Bethesda: NISO, 2004 Y. Li and M. Dong, Towards a Knowledge Portal for E-Learning Based on Semantic Web, inProc. 8th IEEE Int. Conf on Advanced Learning Technologies, ICALT'08. 2008, pp. 910-912. N. Henze, P. Dolog, and W. Nejdl, Reasoning and Ontologies for Personalized E-Learning in the Semantic Web. Educational Technology &Society, pp. 82-97. O. Kucuktunc, U. Gudukbay, and O. Ulusoy. A natural language-basedinterface for querying a video database. IEEE MultiMedia, 14(1):8389,2007. H. Alani, S. Kim, D.E. Millard, M.J. Weal, W. Hall, P.H. Lewis and N.R. Shadbolt, Automatic Ontology-Based Knowledge Extraction from Web Documents, Proc. IEEE Intelligent Systems, 2003, pp. 14-21. Forsberg, M. (2003). Why is Speech Recognition Difficult. Chalmers University of Technology

[2] [3] [4]

[5] [6] [7]

[8]

[9]

[10]

[11]

You might also like