pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

pset6: Mispellings

Hash Tables load size check unload

Tommy MacWilliam [email protected]

Tries

October 23, 2011

Today’s Music pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

I

Epic Music

Hash Tables load

I I

size I check unload Tries

I I

Don’t Touch This (Busta Rhymes feat. Travis Barker) Lux Aeterna (Clint Mansell) Tapp (3OH!3) 300 Violin Orchestra (Jorge Quintero) Beaumont (3OH!3)

Today pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables

I

speller.c

load

I

linked lists

I

hash tables

I

tries

size check unload Tries

speller.c pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

I

I

Hash Tables load

calls load() on dictionary file dictionary contains valid words, one per line

I

iterates through words in file to spellcheck, calls check() on each word

unload

I

calls size() to determine number of words in dictionary

Tries

I

calls unload() to free up memory

size check

dictionary.c pset6: Mispellings Tommy MacWilliam

I

we must implement load(), check(), size(), and unload()

I

high-level overview:

speller.c Linked Lists Hash Tables load

I

size check

I

unload

I

Tries I

given a list of correctly-spelled words in a dictionary file, load them all into memory for each word in some text, spell-check each word if word from text is found in memory, it must be spelled correctly if word from text is not found in memory, it cannot be spelled correctly

speller.c pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

I

example time!

Linked Lists pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check

I

each node contains a value and a pointer to the next node

unload Tries

I I

need to maintain a pointer to the first node last node points to NULL

Creating Linked Lists pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

typedef struct node { char *word; struct node *next; } node; node *node1 = malloc(sizeof(node)); node *node2 = malloc(sizeof(node)); node1->word = "this"; node2->word = "is"; node1->next = node2; node2->next = NULL;

Traversing Linked Lists pset6: Mispellings Tommy MacWilliam speller.c

I

create pointer to iterate through list, starting at first element

I

loop until iterator is NULL (aka no more elements)

I

at every point in loop, iterator will point at an element in the linked list

Linked Lists Hash Tables load size check unload

I

can access any element of the element

Tries

I

to go to next element, simply move iterator to next

Traversing Linked Lists pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

// assuming first points to the first element node *iterator = first; while (iterator != NULL) { printf("%s\n", iterator->word); iterator = iterator->next; }

Freeing Linked Lists pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

I

need to explicitly free() each element in the list I I

but, once you free(), you can’t access next any more determine next node, free() the current node, then move on to next node

Freeing Linked Lists pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

// assuming first points to the first element node *iterator = first; while (iterator != NULL) { node *n = iterator; iterator = iterator->next; free(n); }

Hash Tables pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

(image courtesy Wikipedia)

Hash Tables pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables

I

fixed number of buckets (aka an array)

load

I

hash function maps each value to a bucket

size I check unload Tries

must be deterministic: same value must map to same bucket every time

Hash Function pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

I

outputs a bucket number for each input

Hash Tables

I

since each input is a word, need to convert a word to an integer

I

also make sure integer is a valid bucket number

load size check unload Tries

I

can’t be larger than the number of buckets, which doesn’t change

Best Hash Function Ever pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

int hash(char *name) { if (strcmp(name, "John Smith") == 0) return 152; else if (strcmp(name, "Lisa Smith") == 0) return 1; else if (strcmp(name, "Sam Doe") == 0) return 254; else if (strcmp(name, "Sandra Dee") == 0) return 152; else return 153; }

Still Not a Great Hash Function pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

// many words still have same hash value! int hash(char *word) { int result = 0; int n = strlen(word); for (int i = 0; i < n; i++) { result += word[i] } return result % HASHTABLE_SIZE; }

Collisions pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

(image courtesy knowyourmeme.com)

Collisions pset6: Mispellings Tommy MacWilliam speller.c

I

Linked Lists

what if two values map to the same bucket? I

can’t just wipe out the other value!

Hash Tables load size check

I

hash table contains pointers to the start of linked lists instead of words I

unload Tries

I

need to traverse every element of the linked list to look for word still MUCH faster than linear searching entire dictionary for every word

Structure pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load

typedef struct node { char word[LENGTH + 1]; struct node *next; } node;

size check

node *hashtable[HASHTABLE_SIZE];

unload Tries

// hashtable[i] is a pointer to the // start of a linked list of all words // that hash to i

Reading the Dictionary pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

I

goal: load every word in the dictionary into memory somehow

I

need to iterate over each word in dictionary text file

Hash Tables load size check

I

unload

I

Tries

iterate over text file with while (!feof(fp)) each word must be individually inserted into the hash table

Creating Nodes pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables

I

malloc a new node* n for each word

load

I

use fscanf to read string from file

size I check I unload Tries

fscanf(fp, “%s”, n->word); reads one word from dictionary at a time

Hashing pset6: Mispellings Tommy MacWilliam speller.c

I

Linked Lists

now, n->word contains the word from the dictionary I

Hash Tables load size check

I

now we can hash n->word, since our hash function converts strings to integers

result of hash function gives bucket in hash table for node

unload Tries

I

remember, hash table contains pointers to the start of linked lists

Inserting Nodes pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables

I

hashtable[index] == NULL

load

I

size

I

check

I

unload Tries

no linked list exists yet make hashtable[index] point to n make n->next point to NULL because it is the last element in the linked list

Inserting Nodes pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

I

hashtable[index] != NULL

Hash Tables load

I

I

size check unload Tries

linked list exists already, so add to the beginning of it

I I

adding to the end is much slower!

make n->next point to what is already there make hashtable[index] point to n

Size pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables

I

size() returns the number of words in the dictionary

load I size check unload Tries

I

aka the sum of the number of nodes in your hash table

just keep a counter as you’re loading words!

Check pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

I

goal: given some word, check if it is in the dictionary

I

if word exists in our hash table, it must be spelled correctly

Hash Tables load size check unload Tries

I

if it does not exist in our hash table, it cannot be spelled correctly

Check pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables

I

don’t need to search entire hash table for word

load I size check unload Tries

I

only need to search linked list starting at hash(word);

linked list to traverse starts at hashtable[hash(word)];

Check pset6: Mispellings Tommy MacWilliam speller.c

I

we already know how to traverse a linked list!

Linked Lists

I

at each node in linked list, compare word to input

Hash Tables I

load size

I check

strcmp(string1, string2): returns 0 if string1 and string2 are equal still, spell-checker needs to be case-insensitive!

unload

I

if strings are equal, word is spelled correctly

Tries

I

if end of linked list is reached, word is not spelled correctly

Example pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL

Load pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

hash("this") == 1 NULL →

Hash Tables load size check unload Tries

NULL NULL NULL NULL NULL NULL NULL NULL

this

Load pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

hash("is") == 5 NULL

Hash Tables load size check

this



is

NULL NULL NULL

unload Tries



NULL NULL NULL NULL

Load pset6: Mispellings Tommy MacWilliam

hash("csfifty") == 5

speller.c Linked Lists

NULL

Hash Tables load size check unload Tries



this



is

NULL NULL NULL NULL NULL NULL NULL



csfifty

Check pset6: Mispellings

check("isn’t");

Tommy MacWilliam

hash("isn’t") == 3

speller.c Linked Lists Hash Tables

NULL

load size check unload



this



is

NULL NULL NULL

Tries

NULL NULL NULL NULL



csfifty

Check pset6: Mispellings

check("this");

Tommy MacWilliam

hash("this") == 1

speller.c Linked Lists Hash Tables

NULL

load size check unload



this



is

NULL NULL NULL

Tries

NULL NULL NULL NULL



csfifty

Check pset6: Mispellings

check("csfifty");

Tommy MacWilliam speller.c

hash("csfifty") == 5

Linked Lists Hash Tables

NULL

load size check unload



this



is

NULL NULL NULL

Tries

NULL NULL NULL NULL



csfifty

Check pset6: Mispellings

check("csfifty");

Tommy MacWilliam speller.c

hash("csfifty") == 5

Linked Lists Hash Tables

NULL

load size check unload



this



is

NULL NULL NULL

Tries

NULL NULL NULL NULL



csfifty

Unload pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables

I

goal: free() entire hash table from memory

load

I

array allocated with node *array[LENGTH] does not need to be freed

I

anything malloc’d must be freed

size check unload Tries

Unload pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

for each element in hashtable for each element in linked list free element move to next element

Valgrind pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

I

valgrind -v --leak-check=full ./speller ~cs50/pset6/texts/austinpowers.txt I

example time!

Tries pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

Tries pset6: Mispellings Tommy MacWilliam speller.c

I

rather than a single word, nodes contain an array with an element for each possible character

I

value of element in array points to another node if corresponding letter is the next letter in any word

Linked Lists Hash Tables load size

I

check unload Tries

I

if corresponding letter is not the next letter of any word, that element is NULL

also need to store if current node is the last character of any word

Structure pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

typedef struct node { bool is_word; struct node *children[27]; } node;

load pset6: Mispellings Tommy MacWilliam

I

iterate through letters in each dictionary word I

speller.c Linked Lists Hash Tables

also keep iterator to iterate through trie as you insert letters

I

each element in children corresponds to a different letter

I

look at value for children element corresponding to current letter

load size check unload Tries

I

I

I

if NULL, malloc a new node, point to it, and move iterator to new node if not NULL, simply move iterator to new node

if letter is ’\n’, mark node as valid end of word

size pset6: Mispellings Tommy MacWilliam speller.c Linked Lists Hash Tables load size check unload Tries

I

same thing, keep a counter as you load words!

check pset6: Mispellings Tommy MacWilliam speller.c

I

Linked Lists

attempt to travel downwards in trie for each letter in input word

Hash Tables I

load size

I

check

I unload Tries

I

for each letter, go to the corresponding element in children if NULL, word is misspelled if not NULL, go to that pointer and move on to next letter

if at end of word, check if this node marks the end of a word

unload pset6: Mispellings Tommy MacWilliam speller.c Linked Lists

I

unload nodes from bottom to top!

Hash Tables

I

travel to lowest possible node, then free all pointers in children

load size

I

check unload Tries

I

then, backtrack upwards, freeing all elements in each children array until you hit root node

natural recursive implementation

pset6: Mispellings

Oct 23, 2011 - loop until iterator is NULL (aka no more elements). ▷ at every point in loop, iterator will point at an element in the linked list. ▻ can access any ...

600KB Sizes 1 Downloads 137 Views

Recommend Documents

pset6
Oct 22, 2010 - on cloud.cs50.net as well as filling out a Web-‐based form (the latter of ... Allow you to design and implement your own data structure. .... making copy after copy of your code, giving each file its own name? ... some server or in y

pset6
Oct 22, 2010 - own laptop that saves your files not on your hard drive but on cloud.cs50.net ... 9 < 15 notice how we go about passing check, word by word, the ...

Problem Set 6: Mispellings
Oct 22, 2010 - summary: This is Problem Set 6's distribution code. Notice that the log is sorted, from top to bottom, in reverse chronological order. And notice that the earliest commit (i.e., changeset) is identified labeled with 0:13d2516423d8. Tha

Problem Set 6: Mispellings
Oct 22, 2010 - -rw-r--r-- 1 jharvard students 990 Oct 22 18:59 dictionary.h. -rw-r--r-- 1 jharvard students 0 Oct 22 18:59 questions.txt. -r--r--r-- 1 jharvard students 5205 Oct 22 18:59 speller.c lrwxrwxrwx 1 jharvard students 32 Oct 22 18:59 texts