Remove duplicate rows from google spreadsheet

I needed to remove duplicate rows from a google spreadsheet and regex the last part of the string for each row

I had a specific work case where I should go through a long list of urls and had to extract the string from the last “/”. I was given a long list of URLs, and with no intention of going through each of them manually, I decided to speed up things a bit with a little Google Spreadsheet scripting. I then noticed that there where duplicate URLs in my list – eeek, I need to remove duplicate rows.

I copy pasted the urls to a google spreadsheet, filling the cells within column A.

 

So I did a quick google found me this script and made it possible for me to remove any duplicates in a quick manner.
This is what I did:

Go to tools > script editor > paste the following script > save it > run the script from within the editor (while still having the active sheet selected in the other tab).

function removeDuplicates() {
  //Remove duplicate rows
  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var newData = new Array();
  for(i in data){
    var row = data[i];
    var duplicate = false;
    for(j in newData){
      if(row.join() == newData[j].join()){
        duplicate = true;
      }
    }
    if(!duplicate){
      newData.push(row);
    }
  }
  sheet.clearContents();
  sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}

And voila – no more duplicates.

I now needed to extract the title part of the url, and here a small regex script came in handy:

function regExThis(cell) {
  var pattern = /\/(?:.(?!\/))+$/g
  return pattern.exec(cell);
}

This script I can call from within my cells on the spreadsheet. I simply added the following in the row next to my urls, and copied it down to the entire column (thus automatically adding the correct cell)

Regular expression in a cell

I was able to remove duplicate rows and clean the cells with the regular expression script.

This gave me a list of cleaned strings that I could use for other automation and the script I can use for cleaning in my daily work.