COSC581-HW5

Contains code for homework 5 problems.

Zachary Perry, 3/11/25

Folder Structure

- cmd/problem_1/problem_1.go: contains code for problem 1
- cmd/problem_2/problem_2.go: contains code for problem 2
- input/: contains sample input files for each problem
    - files starting with 1_ represent inputs for problem 1
    - files starting with 2_ represent inputs for problem 2

How to run

Compile using the Makefile: make
Clean & remove bin files: make clean
To run problem_1:

./bin/problem_1 filename

NOTE: This will output a binary file to the output/ directory

To run problem_2:

./bin/problem_2 filename

NOTE: I am using Go version 1.22.9, as this is the version installed on the UTK Tesla/Hydra machines (where I am assuming this will be ran)

Problem #1

Greedy Algorithms: Write an algorithm in the language of your choice to provide a Huffman encoding from the contents of a .txt file (just the 26 lower case letters of the alphabet in some arbitrary combination and length) and apply it to compress the file. Your program should output 2 things, the frequency table for the characters, and the size of the file (in bytes) before and after the encoding. Note: You will build the frequency table based on the content of the file.

Solution:

To solve this question, I first started with reading in the file contents and creating a character frequency table. This will describe how many times each character appears in the file. Then, I created a min heap which will store nodes of type node (shown below). This way, all characters in the min heap can be prioritized based on their frequency. NOTE: The min heap I used was based on the implementation suggested by the Go documentation. I implemented the required interface and used it with my node struct.

// node struct represents a node in the MinHeap.
// char -> represents the character read in from the file for this node.
// freq -> frequency the character appears in the file.
// left -> left node connection in the min heap.
// right -> right node connection in the min heap.
// NOTE: This is also used to create new 'parent' nodes, that point to two 'char' nodes, with the frequency set to the total of both left.freq + right.freq.
type node struct {
	char  byte
	freq  int
	left  *node
	right *node
}

Once the min heap is officially built, I am then able to build the Huffman Tree. This just consisted of popping off the two highest priority nodes (lowest frequencies), making a parent node (with a frequency equal to the total) and adding it back into the min heap. Then, I traverse the tree and build the Huffman Codes.

Once I have all of the Huffman Codes created for their corresponding letters based on the tree, I can now fully encode and write a binary file. The tricky part here was converting the Huffman codes (which I saved as strings) into actual bytes to be written. I accomplished this by first traversing the original file contents, char by char, looking up the corresponding Huffman code, and concatenating it to a string. The resulting string, which was all codes added together, was then parsed 8 bits (byte) at a time, converted to an actual byte, and stored. I also made sure to account for padding of the last element in the string, as more often than not, the very last bit string may have a length < 8. If so, I just pad with "0"'s, then convert to a byte.

The results were then written, producing a compressed binary file. Here is the output when ran on a file with a size of 100,001 bytes.:

      ====================== 
      |   CHAR  |  FREQ    |
      ====================== 
      |     a   |  3861    |
      |     b   |  3851    |
      |     c   |  3896    |
      |     d   |  3914    |
      |     e   |  3835    |
      |     f   |  3817    |
      |     g   |  3876    |
      |     h   |  3922    |
      |     i   |  3875    |
      |     j   |  3726    |
      |     k   |  3833    |
      |     l   |  3860    |
      |     m   |  3756    |
      |     n   |  3842    |
      |     o   |  3913    |
      |     p   |  3784    |
      |     q   |  3842    |
      |     r   |  3819    |
      |     s   |  3861    |
      |     t   |  3850    |
      |     u   |  3846    |
      |     v   |  3802    |
      |     w   |  3893    |
      |     x   |  3893    |
      |     y   |  3800    |
      |     z   |  3833    |
       -------------------- 

Size of the file BEFORE compression: 100001 bytes
Size of the file AFTER compression : 59572 bytes

Problem #2

Network Flow: Write an algorithm in a language of your choice that takes as input a flow network in modified DIMACS format and prints all flow augmenting paths. Ensure infinite looping isn’t possible. I will try to break your code. Ex. File (tab separated values)

The first row is the vertex count and then the arc count. The remaining rows are the arcs: vertex 1, vertex 2, and arc weight 0 will always be the source vertex, and the vertex with the largest id will always be the sink. In the above example that makes 0 the source and 4 the sink.

Solution:

To accomplish this, I started with reading in any input files (accounting for the DIMACS formatting) and creating the graph. My approach to creating the graph involves using two specific structs:

// Graph struct will represent all edges.
// Key in the map       -> vertex value.
// Val for each vertex  -> list of edges.
type Graph struct {
	edges map[int][]*Edge
}

// Edge struct represents an individual edge.
// to       -> destination vertex, where this edge goes to.
// capacity -> weight of the edge, capacity that can flow through it.
// flow     -> what is currently flowing through the edge.
type Edge struct {
	to       int
	capacity int
	flow     int
}

Once the graph is created, with all edges stored in the map, keyed on their vertex value, I then am able to find all of the flow augmenting paths via the Edmonds-Karp algorithm. Once all are found, I just output the corresponding paths, their flows, and the total flow (answer to given input shown below):

Augmenting Path #0  0 -> 1 -> 4 | Flow = 4
Augmenting Path #1  0 -> 2 -> 4 | Flow = 3
Augmenting Path #2  0 -> 3 -> 4 | Flow = 5

One interesting issue I ran into was handling cycles. Whenever I was calculating the maximum amount of flow that could be sent through the given path, I would often search for the first occurence of the needed edge for v1 -> v2. This would sometimes result in me using an edge that was resulting in a cycle (backwards with 0 capacity). This would cause my output to be completley wrong, even though I was searching for the right edge. For example:

4 5 
0 1 10 
1 2 5 
2 3 10 
3 1 5 
1 3 5 

Incorrect output:
Augmenting Path #0 0 -> 1 -> 3 | Flow = 10

This is NOT correct, as, for this specific example, the flow for this path should be 5. But, in this case, I was selecting the cyclic edge that had 0 as the capacity, resulting in 10. To solve this, whenever I was calculating the minimumCapacity, I ensure I am grabbing the correct edge. To do this, I track the max residual capacity found for this edge. So, if there are multiple, I only keep the valid one. Then, I can use this value for the minimumCapacity. This ensures that I am always using the edge that has the most available capacity and avoids cycles / infinite looping.

References Used

Huffman Coding Wikipedia
Huffman Coding Generator (for checking my output)
Edmonds-Karp Wikipedia
Go Heap Documentation
Network Flow Visualization (for checking my output)
Introduction to Algorithms - Fourth Edition (for both Network Flow and Greedy)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
cmd		cmd
input		input
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COSC581-HW5

Folder Structure

How to run

Problem #1

Solution:

Problem #2

Solution:

References Used

About

Uh oh!

Releases

Packages

Uh oh!

Languages

zac-perry/Greedy-and-Network-Flow-Algorithms

Folders and files

Latest commit

History

Repository files navigation

COSC581-HW5

Folder Structure

How to run

Problem #1

Solution:

Problem #2

Solution:

References Used

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages