Group by a column in a given tabular data

Problem: A tabular data is given, you have to print all the data in the group by some column form. Group by columns refers to all the data entries or row with the same that particular column value is to be printed together.

Example

Below table is some input table with columns like brand, value in usd, type of computer

google300desktop
google300desktop
microsoft600laptop
amazon400laptop
microsoft500desktop
google500laptop
google400desktop
amazon300laptop
amazon300desktop
google600laptop
The input table

If we group by 3rd column, output will something look like

desktop
google  300     desktop
google  300     desktop
microsoft       500     desktop
google  400     desktop
amazon  300     desktop
laptop
microsoft       600     laptop
amazon  400     laptop
google  500     laptop
amazon  300     laptop
google  600     laptop

Approach 1: we will create a map of element name from that particular column with the list of indexes, then iterating over the map we will get the desired output.

Implementation in c++

//
// Created by mukul on 22-10-2020.
//
#include <bits/stdc++.h>
using namespace std;

/*
 *group by function to group the data into map
 *parameters table as vector of vector and column to be used in group by.
 */
map<string,vector<int>> group_by(vector<vector<string>> arr,int col){
  //initiating map to be used in storing the result
    map<string,vector<int>> res;
  //iterating over all the rows
    for(int i=0;i<arr.size();i++){
        res[arr[i][col]].push_back(i);
    }
  //returning the result
     return res;
}


int main() {
  //table as array
    vector<vector<string>> array(10);
  //entering the data 
    array[0].push_back("google");
    array[1].push_back("google");
    array[2].push_back("microsoft");
    array[3].push_back("amazon");
    array[4].push_back("microsoft");
    array[5].push_back("google");
    array[6].push_back("google");
    array[7].push_back("amazon");
    array[8].push_back("amazon");
    array[9].push_back("google");

    array[0].push_back("300");
    array[1].push_back("300");
    array[2].push_back("600");
    array[3].push_back("400");
    array[4].push_back("500");
    array[5].push_back("500");
    array[6].push_back("400");
    array[7].push_back("300");
    array[8].push_back("300");
    array[9].push_back("600");

    array[0].push_back("desktop");
    array[1].push_back("desktop");
    array[2].push_back("laptop");
    array[3].push_back("laptop");
    array[4].push_back("desktop");
    array[5].push_back("laptop");
    array[6].push_back("desktop");
    array[7].push_back("laptop");
    array[8].push_back("desktop");
    array[9].push_back("laptop");

  //resulting map
    map<string,vector<int>> res;
  //group by column 0 i.e. first column
    res=group_by(array,0);

  //printing result after grouping
    for(auto i:res){
        cout<<i.first<<endl;
        for(auto x: i.second){
            for(auto ele:array[x]){
                cout<<ele<<"\t";
            }
            cout<<endl;
        }
    }
    return 0;
}

Output for the above program

amazon
amazon  400     laptop
amazon  300     laptop
amazon  300     desktop
google
google  300     desktop
google  300     desktop
google  500     laptop
google  400     desktop
google  600     laptop
microsoft
microsoft       600     laptop
microsoft       500     desktop

Time complexity for the above program in O(n), where n is the number of rows in the data/table.

You may also like...

Leave a Reply

Your email address will not be published.