Explain the meaning of the term ‘argument’ in logic. How does it differ from the second meaning of the term that we commonly use?
January 15, 2018
Assignment 2) Think slavery is a thing of the past
January 15, 2018

CS332 Programming Assignment

CS332 Programming Assignment

Introduction:
A web search engine performs two types of tasks concurrently: 1) one type downloads the contents of web pages, and 2) one type parses the contents of the web pages and keeps track of link counts to determine how important a web page is.

In this project, you will be implementing a simple web crawler, where the “web pages” are stored in local files in a simplified format. The web crawler will consist of two sets of concurrent threads: one set of threads for reading in the files (downloaders), and another set of threads for parsing the files (parsers).

Queues:
A downloader will send work to the parsers using a fixed-size queue (also known as a bounded buffer). A downloader will add new pages to be parsed to the queue, and a parser will remove pages from the queue. A downloader must wait if the queue is full, and a parser must wait if the queue is empty. Since the queue is a shared data structure, you will need to use mutual exclusion and condition variables to ensure correct execution.

Link counts:
The parsers maintain a count each time a link is referenced in a page.

File Format:
Files representing web pages will be text files named using the following convention: page1, page2, page3, …, pageN. Links in the file are identified by the LINK keyword as follows LINK:<page-name>, where <page-name> is page1, page2, …, pageN.

Phasing:
The project will be done in two phases.

Phase 1:
In Phase 1, execution will be iterative (sequential). There will be no threads and concurrency and hence no need for locks or condition variables.

Create a bounded buffer called pages consisting of pages to be parsed, along with the operations to add and remove an item from the buffer (add and remove).

Create a downloader function to do the following:
• read the contents of the next page into memory
• add the contents to the page queue

Create a parser function to do the following
• remove a page from the page queue
• parse the content for links
• maintain a count of the times each link has been referenced
• print links encountered as they are traversed, along with current link counts as follows:

page1 -> page2 (N), where N represents the current # of links to this page
page1 -> page3
page2 -> page5
page3 -> page5

Main routine:
• Read an argument from the command line indicating the size of the page buffer (default = 1).
• Run the downloader function and the parser function iteratively in a loop until all file page contents parsed.
• Print out the total link counts indicating how often each page was referenced in the web graph, in the form:

Total Link Counts:
page2 (1)
page5 (4)
etc

Implementation Assumptions:
— You can assume the number of files to be read by the downloader is a maximum of 10, i.e. page 1, page2, page3, … ,page10
— You can assume there are no cycles in the web graph
— File lengths are short (< 1K)

Testing:
— Files and file content for testing the program will be provided closer to the hand-in date

Phase 2:

The Phase 1 program is modified to support concurrent execution of the downloader and parser routines using threads. A Lock and two Condition Variables are used to ensure correct execution.

Add the following to Phase 1 program:
• Read two additional arguments from the command line indicating the number of downloader and parser threads to be created (default= 1 downloader thread and 1 parser thread)
• Create the number of specified threads for the downloader and parser
• Add a Lock for Mutual Exclusion and Condition Variables to check for full and empty conditions in downloader and parser routines

Programming Languages:
C, C++ (you need to be able to make use of the thread synchronization primitives natively).

Deliverables:
You will hand in your source files for both the Phase 1 implementation (sequential execution, no threads) and the Phase 2 (multi-threaded) implementation. Both are due on the same due date.

In your AFS user directory, each of you should see a symbolic link to a directory of the form:
s17.cs.332.006

The source files needs to be copied to this directory.

In addition to the source files, there also should be a text file called “README” that details the exact command lines that should be entered at the prompt to compile and run each program. You should also explain the default behavior with no arguments (e.g. buffer size is assumed to be 1, 1 thread for downloader and 1 for parser), as well as the arguments necessary to change each of these values.

In summary, please include the following in the above directory:

  1. README file explaining the following
    a. Command line to compile Phase 1 program
    b. Command line to run Phase1 program (including arguments)
    c. Command line to compile Phase 2 program
    d. Command line to run Phase 2 program (including arguments
  2. Program file for Phase 1
  3. Program file for Phase 2

Also, please print out the program files for Phase 1 and Phase 2 and submit these in class on the due date.

Thread API Reference:
See Arpaci-Dusseau, Chap 27 for information about the Thread API including API for Locks and Condition Variables.
Also Linux man pages.
Implementation Hints:

Example of Bounded Buffer Implementation:
Reference: Arpaci-Dusseau, Chapter 30, Figure 30.1 ; Silberschatz et al, Chapter 5.1

int buffer[MAX];
int in = 0;
int out = 0;
int count = 0;

void add (int value) {
buffer[in] = value;
in = (in + 1) % MAX;
count++;
}

int remove () {
int tmp = buffer[out];
out = (out + 1) % MAX;
count–;
return tmp;
}Thread-safe System Library Routines:
If you are using library routines, such as strtok() for parsing, make sure to use the re-entrant, thread-safe versions, when necessary e.g. strtok_r().

 

"Are you looking for this answer? We can Help click Order Now"

assignment help