Searching and Sorting
Searching and sorting techniques play a vital role in maintaining database applications. Since a database is a collection of the massive volume of data which is stored in a form of records. Each record contains numerous different types of attributes depending on the structure defined. It is very difficult and time-consuming to retrieve a specific record from the database, where records are not stored in sorted order if there are no proper searching and sorting methods.
Table of Contents
Searching
Searching is a process of finding the correct location of an element from a list or array of elements. In an array, elements are stored in consecutive memory locations with the remaining element until the exact match is found. If the element is found, the search process is said to be successful or else the search process is terminated unsuccessfully.
The time complexity of searching technique depending on the number of comparisons made to find an exact, match. The methods used for searching are,
- Linear /Sequential Search
- Binary / Logarithmic Search
Sequential Searching
Linear searching is nothing but searching an element in a linear way. This can be an array or in a linked list. We have a need to start the search from beginning and scan the element one by one until the end of the array or linked list. If the search is successful then it will return the location of the element, otherwise, it will return the failure notification.
Algorithm for sequential search
Let us take a linear search algorithm for array
- Step 1: assign index=0
- Step 2: Scan each element of the array one by one
- Step 3: if the match occurs then return the index value. Otherwise index=index+1.
- Step 4: Repeat the same process until one unique value comes in scanning
- Step 5: Return failure notification
Let us take the linear search algorithm for array
- Step 1: Take a pointer of a node type and initialize it with start.
- Step 2: scan each node of the linked list by traversing the list with the help of ptr. ptr=ptr->link;
- Step 3: If the match occurs the return node value
- Step 4: Repeat the same process until NULL comes in scanning
- Step 5: Return the failure notification
Analysis
As the linear search behavior of this algorithm, it searches elements one by one in a linear manner. In the case of a successful search, it will search the element up to the position in the array where it finds the element in the array. Suppose it finds the element at the first position of the array then it will behave like best case. in the case of an unsuccessful search, it will search up to the last element array.
The main disadvantage of the linear search method is that, it is a time-consuming method, and also is inefficient when a large number of an element is present in a list. The time complexity for linear search method is O(n)
Logarithmic Search
The sequential search situation will be in the worst case if the element is at the end of the list. For eliminating this problem, we have one efficient searching technique called binary search. The condition for binary search is that all the data should be in a sorted array. We compare the element with the middle element of the array. If it is less than the middle element then search it in the left position of the array and if it is greater than the middle element then the search will be in the right portion of the array. We will take that portion only for search and compare it with the middle element of that portion. This process will be in iteration until we find the element or middle element has no left or right portion to search.
An iterative algorithm for Binary Search
K list of N elements in ascending order, X element to be found, Low temporary variable for the lower limit of the search interval, Middle temporary variable for a middle limit of the search interval and high temporary variable for the upper limit of the search interval.
- Step 1: assign index=0
- Step 2: Scan each element of the array one by one
- Step 3: enter x value i.e. element to be found.
- Step 4: initialize the low value to 0 and high value to n-1.
- Step 5: perform search method up to condition become false
- while( low<=high)
- Step 6: obtain an index of midpoint value
- middle=(low+high)/2
- Step 7: compare the searched element value with a list
- if (x<k[middle) then
- high = middle-1
- else if (x>k[middle] then
- low = middle+1
- else if(x==k[middle) then
- print element found at position
- return middle
- goto step 5
- Step 8: if the searching element is not found then
- print searching element not found
A recursive algorithm for Binary search
K list of N elements in ascending order, X element to be found, Low temporary variable for the lower limit of the search interval, Middle temporary variable for the middle limit of the search interval and high temporary variable for the upper limit of the search interval, Loc position variable
- Step 1: assign index=0
- Step 2: Scan each element of the array one by one
- Step 3: enter x value i.e. element to be found.
- Step 4: initialize the low value to 0 and high value to n-1.
- Step 5: obtain an index of midpoint value
- middle=(low+high)/2
- Step 6: perform a search method up to finds the element
- if(x<k[middle]) then
- loc = binary-search(low,middle-1,k,x)
- else if(x>k[middle]) then
- loc = binary-search(middle+1,high,k,x)
- else
- loc = middle
- return (loc)
- if(x<k[middle]) then
The time complexity for a binary search method is A non-recursive binary search algorithm using a while loop. The number of iterations takes place depends on the number of elements in the list. This method is fast when compared to the linear search method. Thus, a worst case search requires O(log n) iterations, n is the number of elements in the list.
Fibonacci Search
Fibonacci search is also known as Fibonaccian search, is an improved version of binary search. The improvement is not to use the middle element at each step for splitting the array but rather to guess the key within the smaller internals. This internals is determined with the help of Fibonacci numbers, which are defined as follows.
- f0=0
- f1=1
- fn =fn-1+fn-2 for n>=2
The Fibonacci search is a divide and conquers algorithm for searching a sorted array. This searching method splits the array corresponding to the Fibonacci number. Fibonacci search is much faster than a binary search.
Algorithm for Fibonacci Search
Let fn is the nth Fibonacci number (where fn+2=fn+fn-1) for k>=0) and f0=0,f1=1. To find whether an item is in the list of l= fn ordered numbers, the following procedure is used.
- Step 1: Set n=m
- Step 2: Check (if n==0) not found goto step 7
- Step 3: Check item against the entry in position fn-1
- Step 4: If match is found, goto step 7
- Step 5: Check (if item<entry fn-1 ) then remove entries from location fn-1 to n
- Set n=n-1
- goto step 2
- Step 6: Check (if item> entry fn-1 )
- Then remove entries from location 1 to fn-1
- Set n=n-2
- goto step 2
- Step 7: stop
The time complexity for Fibonacci Search algorithm is O(log(n))
Sorting
Sorting is a technique of organizing the data. It is a process of arranging the records either in ascending or descending order i.e., bringing some orderliness in the data. Sort methods are very important in data structures. Sorting can be performed on any one or a combination of one or more attributes present in each record. It is very easy and efficient to perform searching if data is stored in sorted order. The efficiency of sorting technique depending on the utilization of memory, sorting time, and utilization of processor, etc. sorting is classified in the following types
- bubble sort
- selection sort
- insertion sort
- Shell Sort
- quicksort
- merge sort
- Heapsort
Exchange/Bubble Sort
Bubble sort algorithm for sorting is one of the simplest but inefficient algorithms, which organize the sorting procedure by uniformly checking the two adjacent elements and accordingly adjusting their positions.
It is inefficient because the number of iterations increases with the increase in the number of elements to be sorted. Hence, if there are n elements to be sorted then the number of iterations required are n-1.
If N elements are given in memory then for sorting we do the following steps,
- Step 1: first compare the 1st and 2nd element of an array if arr[0]<arr[1] the compare the 2nd with 3rd.
- Step 2: if arr[1]>arr[2] i.e.,2nd and 3rd elements then interchange the values of 2nd and 3rd.
- Step 3: Now compare the value of 3rd (which has the value of 2nd) with 4th.
- Step 4: similarly compare until the N-1th element is compared with the Nth element.
- Step 5: Now the highest value element is reached at the nth place.
- Step 6: Now elements will be compared until N-1 elements.
The Time Complexity of Bubble Sort Algorithm
- worst case: O(n2)
- average case: O(n2)
- Best case: O(n2)
Analysis
In bubble sort, n-1 comparisons will be in the 1st pass, n-2 in the 2nd pass, n-3 in a 3rd pass and so on. So it’s very simple to calculate the number of comparisons. Total number of comparisons will be (n-1)+(n-2)+(n-3)+……+3+2+1. it’s a form of arithmetic progression series. So we can apply the formula
sum=n(n-1)/2 which is of O(n2)
The main advantage is the simplicity of the algorithm, the additional space requirement is only one temporary variable and it behaves as O(n) for a sorted array of the element.
Selection Sort
A selection sort algorithm sorts the list by selecting the largest element in the unsorted list and places it at the appropriate position (i.e., swapping). In each pass, the largest element in the list is placed at the appropriate position. If there are n elements in the list,
In the first pass, all the elements are compared and the largest element is placed in the last position. In the second pass, first n-1 elements are compared and the process continues for list of n-2 elements, then for n-3 elements and so on until the first element. This process is continued until all the elements are sorted.
[quote]Note: If there are n elements in a list, we require n-1 passes.[/quote]
Pass 1:
- Search the smallest element from arr[0] to arr[N-1].
- Interchange arr[0] with the smallest element.
- Result arr[0] is sorted.
Pass 2:
- search the smallest element from arr[1]…arr[N-1]
- Interchange arr[1] with the smallest element.
- Result arr[0],arr[1] is stored.
Pass 3:
- Like pass 1 and pass2 need to repeat until N-1 pass.
Pass N-1:
- Search the smallest element from arr[N-2] and arr[N-1].
- Interchange arr[N-2] with the smallest element.
Result arr[0]…….arr[N-1].
Advantage
Selection sort minimizes data movement by putting one entry at its final position at every pass. Hence this algorithm is useful for a continuous list of large elements where the movement of entries is expensive.
Disadvantage
- If the elements are stored using a linked list, then insertion sort is more efficient than selection sort.
- In average case and worst case time complexity is the same i.e., O(n2)
Analysis
As we have seen selection sort algorithm will search the smallest element in the array and then that element will be a proper position. So in pass 1, it will compare n-1 elements the same process will be for pass 2 but this comparison will be n-2 because the first element is already at the proper position. The elements, which are already at the correct position, will not be disturbed.
Insertion Sort
As the name suggests, insertion sort is a technique, which inserts an element at an appropriate location(in an array) by comparing each element with the corresponding elements present at its left and accordingly moves rest of the elements of the given array.
Algorithm for Insertion Sort
A list of N elements i.e., array, t temporary variable for interchange the value, k total no of passes and j & i another control variables
- Step 1: assign i=0
- Step 2: Scan each element of the array one by one
- Step 3: set k value to 0. i.e.,k=1
- Step 4:for k=1 to (n-1)
- set t=A[k];
- set j=k-1
- while t<A[j] and j>=0 perform the following steps
- set A[j+1]=A[j]
- [End of the loop structure]
- Assign the value of t to A[j+1]
- [End of for loop structure]
- Step 5: Exit
The time complexity for insertion sort
- Worst case:O(n2)
- Average case: O(n2)
- Best case: n-1
Analysis
In insertion sort, we insert the element before or after and we start comparison from the first element. Since the first element has no other elements before it. So it does not require any comparison. The second element requires 1 comparison, third requires 2 comparisons. The fourth requires 3 comparisons and so on. The last element requires n-1 comparisons.
Shell Sort/Increment Sort
The shell sort is based on the insertion sort. The given list of elements is broken down in a few smaller lists. After this, the insertion sort is applied on such smaller lists. These smaller lists are again recombined. Another partition is made and the procedure is repeated. The breaking down of the list into smaller lists is by selecting elements at a specific location.
This is determined by a factor called increment. Consider the list 32,21,6,43,31,7,44.we may initially take the increment as 3. This would break the list as 32,43,44;21,31 and 6,7. The first list is done using element[0],element[3] and element[6] where the increment is 3. This list is sorted using insertion sort. The next list is element[1] and element[4] where the increment is the same 3. The final list is element[2] and element[5] with the same increment of 3. After sorting them, the sublists are recombined.
As a next stage, the increment is changed and the process is repeated. The process continues till the increment is 1.the choice of increment is left to our choice.
Algorithm for Shell Sort
A list of N elements i.e., array, incr maximum increment value, k temporary variable j & i control variables
- Step 1: assign i=0
- Step 2: Scan each element of the array one by one
- Step 3: read incr value (more than 0)
- Step 4: while incr>=1 perform the following steps
- for Set j=incr to n
- k=A[j]
- for set i=j-incr to i>=0 and k<A[i]
- A[i+incr]=A[i]
- i=i-incr
- [End of inner for loop structure]
- k=A[j]
- Assign the value of k to array. i.e, A[i+incr]=k
- [End of outer for loop structure]
- incr=incr-2
- [End of the while loop structure]
- for Set j=incr to n
- Step 5: Exit
Quick Sort/Partition Exchange Sort
The most powerful sorting algorithm is quicksort. It uses the policy of divide and conquers. It is also known as a partition exchange sort. In Quicksort we divide the original list into two sublists. We choose the item from the list called key from which all the left side elements are smaller and all the right side of elements are greater than that element. One list is on the left side of the key and the second list is on the right side of the key, means the key will be in its actual position. Hence there are two conditions for choosing key-
All the elements on the left side of the key should be smaller or equal to the key. All the elements on the right side of the key should be greater than are equal to key. Similarly, we choose the pivot for dividing the sublist until there are two or more elements in the sublist.
The process for sorting the elements through quicksort is as-
- Step 1: Take the first element of the list as a key.
- Step 2: place key at the proper place on the list. So one element of the list i.e. key will be at its proper place.
- Step 3: create two sublists left and right side of the key.
- Step 4: repeat the same process until all elements of the list are at the proper position in the list
For placing the key at the proper place we have a need to do the following process
- Step 1: Compare the key element one by one from right to left for getting the element which has value less than pivot element
- Step 2: Interchange the element with the key element
- Step 3: Now the comparison will start from the interchanged element position from left to right for getting the element which has a higher value than key.
- Step 4: repeat the same process until the key is in its proper position.
Analysis
The time requirement of quicksort depending on the position of the key in the list, how the key is dividing the list into sublists. It may be an equal division of list or maybe it will not divide also.
The time complexity for Quicksort
Average case: in the average case, we assume that the list is equally divided means list1 is equally divided into two sublists, these two sublists in four sublists and so on. So the total number of elements at a particular level will be 21-1. So the total number of steps will be log2n. the number of comparison at any level will be maximum n. so we can say run time of quicksort will be of O( n log n).
Worst case: Suppose a list of the element is already in sorted order. When we find the key then it will be the first element. So hear it produces only 1 sublist which is on the right side of the first element starts from the second element. Similarly, other sublists will be created only at the right side. The number of comparison for the first element is n, the second element requires n-1 comparisons and so on. In this case O(n2) time required.
Algorithm for Quick Sort
An array of n elements, LB lower bound of the current sub-list, UB upper bound of the current sub-list, i index variable, j index variable, key is key value and flag logical variable
- Step 1: assign i=0
- Step 2: Scan each element of the array one by one
- Step 3: initialize flag value to 1.i.e.,flag = true
- Step 4: perform flowing sorting statements
- if LB<UB then
- i = LB
- j = UB+1
- key = A[LB]
- Repeat while (flag)
- i = i+1
- Repeat while A[i]< key
- i =i+1
- j=j-1
- key = A[LB]
- Repeat while A[j]>key
- j = j-1
- if i<j then
- A[i]==A[j]
- else flag=false
- A[LB]==A[j]
- Call Quick_Sort(A,LB,j-1) (sort 1st sublist)
- Call Quick_sort(A,J+1,UB)(sort 2nd sub-list)
- Exit
Merge/2-way merge sort
Merge sort is a sorting technique that is generally applied between two sorted arrays. Here a third array is created and the elements of both the sorted arrays are matched, correspondingly, the third array is filled. Hence, the final third array obtained will be consisting of elements belonging to both the arrays in sorted order.
Algorithm for Merge Sort
Consider, that there are two arrays namely array A and B respectively. Each of these arrays consists of ‘3’ elements. The two arrays with their elements are represented below
In order to proceed with the algorithm, a third empty array ‘C’ is created, with a capacity of 6 bits (as array A and B consists of a total of 6 bits). The following steps narrate various stages and mechanisms of the algorithm.
Step 1: Initially both the arrays ‘A’ and ‘B’ are sorted by considering the algorithm like bubble sort or inserting sort etc. the resultant arrays obtained are given below.Step 2: Compare the elements in both arrays. i.e. A and B. and drop that element into the array ‘C’.
- A[0]<B[0] then drop A[0] into C[0]
Step 3: Element ‘4’ of array A is compared with element 6 of array ‘B’. As A[1]<B[0], hence A[1] is dropped into the array C.
Step 4: Element ‘5’ of array ‘A’ is compared with element ‘6’ of array ‘B’. As A[2]<B[0] hence A[2] is dropped into the array ‘C’.
Step 5: As there are no elements left in ‘A’, hence the process gets terminated by inserting rest of the elements of ‘B’ into ‘C’. The resultant ‘C’ obtained will consist of elements in ascending order.
Analysis
Let us take an array of size n is used for merge sort. Because here we take elements in pairs and merge with another pair after sorting. So the merge sort requires maximum log2n passes. Each pass, the total number of comparisons can be maximum n. hence we can say merge sort requires maximum n* log2 n compares with is of O(nlog2n). The main disadvantage of the merge sort is space requirement it requires extra space of O(n).
The time complexity for Mergesort
- Worst case:O(log n)
- Average case:O(n log n)
- Best case:O(n log n)
Heap Sort
The elements of the heap tree are represented by an array. The root will be the largest element of the heap tree. Since it is maintained in the array, so the largest value should be the last element of the array.
Algorithm for Heapsort
- Step 1: Given an array of elements, build a heap by adjusting the elements from that array.
- Step 2: Delete the root element from the heap by placing it at the end of the array.
- Step 3: The remaining elements do not form a heap. Step1 and step2 should be repeated for the remaining elements. This process is carried on until all the elements are deleted.
Note: The maximum index value of the array is decremented by one when an element is deleted.
Comparing the different type of sorting:
Sorting Technique | Best Case | Average Case | Worst Case |
Bubble | O(n) | O(n2) | O(n2) |
Insertion | O(n) | O(n2) | O(n2) |
Selection | O(n2) | O(n2) | O(n2) |
Shell | – | – | – |
Quick | O(n2) | O(n log n) | O(n log n) |
Merge | O(n log n) | O(n log n) | O(n log n) |
Heap | O(n log n) | O(n log n) | O(n log n) |