Group by a column in a given tabular data
Problem: A tabular data is given, you have to print all the data in the group by some column form. Group by columns refers to all the data entries or row with the same that particular column value is to be printed together.
Example
Below table is some input table with columns like brand, value in usd, type of computer
300 | desktop | |
300 | desktop | |
microsoft | 600 | laptop |
amazon | 400 | laptop |
microsoft | 500 | desktop |
500 | laptop | |
400 | desktop | |
amazon | 300 | laptop |
amazon | 300 | desktop |
600 | laptop |
If we group by 3rd column, output will something look like
desktop google 300 desktop google 300 desktop microsoft 500 desktop google 400 desktop amazon 300 desktop laptop microsoft 600 laptop amazon 400 laptop google 500 laptop amazon 300 laptop google 600 laptop
Approach 1: we will create a map of element name from that particular column with the list of indexes, then iterating over the map we will get the desired output.
Implementation in c++
// // Created by mukul on 22-10-2020. // #include <bits/stdc++.h> using namespace std; /* *group by function to group the data into map *parameters table as vector of vector and column to be used in group by. */ map<string,vector<int>> group_by(vector<vector<string>> arr,int col){ //initiating map to be used in storing the result map<string,vector<int>> res; //iterating over all the rows for(int i=0;i<arr.size();i++){ res[arr[i][col]].push_back(i); } //returning the result return res; } int main() { //table as array vector<vector<string>> array(10); //entering the data array[0].push_back("google"); array[1].push_back("google"); array[2].push_back("microsoft"); array[3].push_back("amazon"); array[4].push_back("microsoft"); array[5].push_back("google"); array[6].push_back("google"); array[7].push_back("amazon"); array[8].push_back("amazon"); array[9].push_back("google"); array[0].push_back("300"); array[1].push_back("300"); array[2].push_back("600"); array[3].push_back("400"); array[4].push_back("500"); array[5].push_back("500"); array[6].push_back("400"); array[7].push_back("300"); array[8].push_back("300"); array[9].push_back("600"); array[0].push_back("desktop"); array[1].push_back("desktop"); array[2].push_back("laptop"); array[3].push_back("laptop"); array[4].push_back("desktop"); array[5].push_back("laptop"); array[6].push_back("desktop"); array[7].push_back("laptop"); array[8].push_back("desktop"); array[9].push_back("laptop"); //resulting map map<string,vector<int>> res; //group by column 0 i.e. first column res=group_by(array,0); //printing result after grouping for(auto i:res){ cout<<i.first<<endl; for(auto x: i.second){ for(auto ele:array[x]){ cout<<ele<<"\t"; } cout<<endl; } } return 0; }
Output for the above program
amazon amazon 400 laptop amazon 300 laptop amazon 300 desktop google google 300 desktop google 300 desktop google 500 laptop google 400 desktop google 600 laptop microsoft microsoft 600 laptop microsoft 500 desktop
Time complexity for the above program in O(n), where n is the number of rows in the data/table.
Recent Comments