Shell script - convert Excel (xlsx) to CSV - remove blank space / tab space


Question: 

I receive excel file (xslx) with multiple sheets for my project. The number of records on these sheets ranges from 15k to 70k per sheet. I need to perform following tasks on this data and then convert it to CSV. Or covert to CSV and then process the data either way its fine.

Input Example:

call_no  uniq_no  Type  Strength    Description
2456     15       TX    SomeSting        SomeSting
5263     15       BLL      SomeSting   SomeSting
4263     162      TX                SomeSting
2369     215      LH    SomeSting
4269     426      BLL   SomeSting       SomeSting
7412     162      TX    SomeSting   SomeSting

As per the requirement i need to

  1. Find duplicate values in column 'uniq_no' and delete all duplicate records except the original record (first record).
  2. Replace blanks with data. (Just simple find blank and replace with value logic)
  3. Remove space/tab space in any cell. (This point is not important, its just like a side-quest)

Output Example:

call_no  uniq_no  Type  Strength    Description
2456     15       TX    SomeSting   SomeSting
4263     162      TX    **NewDATA** SomeSting
2369     215      LH    SomeSting   **NewDATA**
4269     426      BLL   SomeSting   SomeSting

This is a routine task for me. I have fair knowledge of shell scripting. So if anyone can guide me even with rough outline of a script for this then i can do tweaks at my end. Please help.




1 Answer: 

Update: the desired platform for the script has been clarified and a response is no longer applicable. However, I will leave this response here in case a future viewer of this question stumbles upon it and finds it useful. Anyone writing a shell script in a Ubuntu language may be able to port over some aspects of this vbscript as well.

Here is something to get you started. If you record actions with Excel's macro recorder remember that using the same commands in a VBS means you have to get rid of all of the named parameters.

prep_xlsx.vbs

Set objExcel = WScript.CreateObject ("Excel.Application")

objExcel.Visible = true 'False  'True for testing

strFileName = "c:\tmp\vbs_test.xlsx"

 set objWb = objExcel.WorkBooks.open(strFileName)
 set objWs = objWb.Worksheets(1)

with objWs
    with .cells(1, 1).CurrentRegion
        .Cells.SpecialCells(4) = "**NewDATA**"  ' 4 is xlCellTypeBlanks
        .Cells.RemoveDuplicates 2, 1                ' Columns:=2, Header:=xlYes
        for c = 1 to .Columns.Count
            with .columns(c)
                .TextToColumns .Cells(1), 2 ', Array(0, 1)  'Range("C1"), DataType:=xlFixedWidth, FieldInfo:=Array(0, 1)
            end with
        next    'next c
    end with
end with

objWb.Close True   'save on close

objExcel.Quit
Set objExcel = Nothing

It should be noted that removing leading / trailing spaces with the Range.TextToColumns method with xlFixedWidth can attempt to split the column into two if there are too many leading spaces. Currently, this will halt the process as it will ask for confirmation on overwriting the next columns values (which you do not want to do). There has to be a significant number of spaces to have Excel guess that it belongs in two columns so unless there are more spaces than a typical word there is nothing to worry about; just something to be aware about. e.g. if there were twice as many leading spaces in D6, it might want to split across two columns.

    vbscript_before
        vbs_test.xlsx before prep_xlsx.vbs

    vbscript_after
        vbs_test.xlsx after prep_xlsx.vbs

 

More Articles


node.js - webpack-dev-server does not create initial bundle on start

I have built a basic babel webpack starter project that i clone from git. The git repository has no dest/output/build folder nor does it have any initial build files. I am knew to webpack so i assumed that when the webpack-dev-server started up it would create a folder (in my case 'build') and do a

mysql - OTRS Fatal Error

hey i want to install otrs on an apache webserver with linux ubuntu.i created a new database with the otrs web installer but then I get the message: Can't open file /opt/otrs/Kernel/Config/Files/ZZZAAuto.pm.20881: Permission denied

node.js - Can't run NPM or ng build, or Apache in Ubuntu terminal

This question already has an answer here: env: bash\r: No such file or directory 4 answers Error with gradlew: /usr/bin/env: bash: No such file or directory 3 answe


php - MySql Install on Ubuntu root issue

Setting. I'm running Ubuntu 15.04 on a custom machine with an existing PHP / Apache / Sqlite setup. I recently installed MySql to fiddle around with Laravel a bit. It was working perfectly until I wasn't paying attention and did something incredibly stupid. I intended to export all of the user's

shell - Change the Particualy string value without string comparison in a file using Command line Ubuntu

I want to create a shell script to change the string values which is after '=' in my file Using command line.File is like: String name = "Max"; String age = "24"; String address = "Noida";OrString name=MaxString age=24String address=NoidaBut here, I don't wanna string comparison, Like this:$ sed -i

mysql - Apache2 / phpmyadmin - PHP isn't working

I was working on a project of mine . then i had to restart the computer.Once I did suddenly PHP stopped working on Apache. then i starting looking into the configuration files and error logs and fixed a spelling error in the config file, but STILL the problem persisted, then i decided to re-install


Why does the following code raise a SegFault. c(Linux)

This a code that would reverse the data of a document and save it in the same document itself.However I am getting a Segmentation Fault.Please Help,I don't know why it gives a SegFault.#include <stdio.h>#include <stdlib.h>#include <termios.h>#include <fcntl.h>#include <str

shell - monitor for file then copy to another directory

I'm relatively new to unix scripting, so apologies for the newbie question.I need to create a script which will permanently run in the background, and monitor for a file to arrive in an FTP landing directory, then copy it to a different directory, and lastly remove the file from the original directo

c - Whats the difference between the two command lines? (SegFault)

I have written a code that maps to a shared memory location,so that the first program opens a shared memory block and stores some data in it.And the second program reads the shared data.Whats the difference between the two command lines:1. if(argc<2) { printf("USAGE:%s text-to-share\n",argv[0

linux - List out text files from given directory in unix

Writing shell script to sort text file data from given directory with for loop in ubuntu as: echo "Enter directory Name" read dr path= ` find /home/user -name $dr ` for x in ` ls *.txt $path ` do sort $x done echo "------Script finished-------"But this script is giving me er