The Array Data Structure
by
Arrays are a basic data structure that can be used in virtually all programming languages. They have the following key features:
- fixed size
- array size and item type must be known upfront
- items are stored in a contiguous memory location
- items must have the same type
- indexed by contiguous positive integers
I have previously written a blog post about the basic features of arrays and examples in C++
, where I describe arrays only in the context of C++. Now I want to take a step back and look at the general data structure of arrays.
Creating Arrays
When creating an array, think of the memory as blocks, each one has the size of your array type. So if we create an array in C with char* myarray[6];
, the program will look at the memory and find the next available free block of 6 bytes, because a char
is exactly one byte.
Let’s assume our memory looks like this, where each []
block contains one byte and [-]
means the memory is already taken up by another part of the program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[-] [-] [ ] [ ] [-] [-] [-] [-] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
The second and third block is free, but we need six blocks of contiguous memory, so that’s not an option. The ninth block is free and the six blocks following it are free too. Sounds like we have found just the right spot. Let’s mark the new reserved memory space for our array with [+]
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[-] [-] [ ] [ ] [-] [-] [-] [-] [+] [+] [+] [+] [+] [+] [ ] [ ]
Accessing or Updating an Array Item at an Index
Complexity = O(1)
The nature of arrays makes it very cheap to access items, because the memory location of each item can be calculated by using the head pointer (pointing to the first item), the index and the item type. Let’s assume we have an array defined like this
int arr[3] = {1, 2, 3};
Accessing item 2
can happen in constant time, because all the machine has to do is to calculate headPtr + typeSize * (index - first-index)
. This makes it easy to access even with very large arrays.
One downside is that we can easily make the mistake of trying to access an index that’s larger than the size of the array. The program will happily abide and give us the random number that happens to live at the given memory location. Although most programming languages have array implementations that protect you from that kind of thin.
Updating an array item follows the same logic of calculating the address of the item in constant time, regardless of the item position or array size.
Insertion or Deletion of Array Items at an Index
Complexity = O(n)
Inserting a new item at an index requires linear time complexity at worst case. To insert a new item at the very first position we have to move every item to the next position, to be able to squeeze the new item in.
Similarly, to delete an item, we have to move every item after the deleted index one place to the left.
However, there is a special case of insertion or deletion at the end of the array. This happens in constant time, because there are no items to the right that have to be moved afterwards. (Except for dynamic arrays, where insertion can mean that the whole array has to be re-allocate in a different memory location)
Traversing the Array
Complexity = O(n)
The time complexity of traversing an array is linear, because we can simply move the iterator forward one step at a time until we come to the end of the array.
Other things to consider
There are static and dynamic arrays, with the latter being an abstract data structure. Dynamic arrays can grow and shrink dynamically and re-allocate a larger base-array if the size reaches the capacity limit. Implementations of dynamic arrays are for example vector
in C++ and list
in Python.
Static arrays are fixed in size, that means that they can’t grow or shrink. It is necessary to create a new array, copy the items, and free the old array. Dynamic arrays don’t have that limitation, but there it can also be necessary to re-allocate the array.
Array indexes are often zero-based. Some languages use one-based index and others let you specify the index. Arrays are indexed by contiguous integers.
Multi-Dimensional Arrays
Since arrays are a basic data structure, they can contain other arrays. Technically there is no limit on how deeply arrays are nested.