Friday, November 15, 2013

Data Structures: Making Computing Possible

Data is at the heart of modern computing. The majority of what a computer scientist will learn, use, and possibly even research comes down to two things: processing data and storing data. The former is achieved through algorithms and the latter through data structures. Algorithms define the means to achieve particular tasks, generally involving the processing of data, and are quite varied in what they do based on any number of possible problems that need to be solved. Data structures on the other hand, while still just as varied as algorithms, are singular in purpose. All data structures have the common goal of storing and allowing for the access of data, but their variety comes from the types of data and the pursuit of the most efficient ways of storing and accessing that data based on its type and the design of the system.

While there are countless algorithms out there, and in fact any explicit process for achieving a particular task is considered an algorithm, there are only a handful of data structures out there and the discovery of revolutionary new data structures is no small feat or event. They generally have trade-offs in efficiency based on their particular use as well, making knowing them even more important for programmers. The three main types of data structures come in the form of linear data structures, trees, and maps.

Linear data structures are, like the name implies, a stringing together of points along a line. Any piece of data has at most two neighbors to it, one before and one after. The most common types of linear structures are arrays and linked lists, although in concept both are structured the same and the efficiency of various actions is the key difference. Arrays, for example, allow for a particular data value to be looked up very efficiently while linked lists allow for a new value to be placed between two existing values very efficiently. In this case the opposite suffers in performance on the other's task. Linear structures can be nested to create multidimensional structures as well such as matrices, but ultimately the linear structure remains intact for each component.

Trees get their name from the branching out from a common point that they exhibit, much like tree branches. They generally involve some method for subdividing each branch to allow for much more efficient search operations. This, however, requires that the data be related in such a way that predictable comparisons be able to be made to determine which branch to take. If this sort of comparison can't be made and result in the same choice every time the structure will not work. Trees generally have the advantage of being efficient in both insertion and retrieval, which if you recall was a one-or-the-other choice with linear structures. While neither operation is as fast as the superior linear counterpart, the efficiency gain for both far exceeds the trade-offs of the linear structures.

Maps tend to stand alone as data structures in that they fit a particular format of data that trees and linear structures can't handle. They are used when a particular data point is mapped to by a particular associated value or values, commonly referred to as keys. A key is a unique, small value that has data, generally much larger in size, associated to it in such a way that when a particular key is searched for the associated data value is returned. For example, one could use people's names (provided they are unique) as a key to retrieve more extensive data on the person. Maps can be stored in trees where the ordering is determined by the key or can be stored as hashes, most easily thought of as a sparsely-populated array containing the keys so that a particular key can be looked up and its associated data returned more efficiently than the tree counterpart.

The specifics of data structures can be quite complex and provided is only a high level overview of some of the most common types and features, but nonetheless they are a very aspect of computer science and worth learning about for anyone interested in programming. Many books and websites exist on data structures as well as algorithms for more information on the topic and greater detail.

No comments:

Post a Comment