Name: maxbre Date: April 1, 2008 at 08:04:44 Pacific Subject: Batch tab/space/comma delimited txt OS: win xp CPU/Ram: 2 gb
Comment:
I’m not sure whether this topic has been already covered in this forum (I’ve been checking, just in case redirect me to the appropriate thread) but I have to sort out this problem hopefully by a batch script.
Let’s suppose I have a text file ‘space delimited’ called a.txt, like: A (space) B (space) 1 (space) 3 (space) 2 (space) 4 (space)
How can I change it to a ‘tab delimited’ file called b.txt, like: A (tab) B (tab) 1 (tab) 3 (tab) 2 (tab) 4 (tab)
remember that (space) or (tab) are just put here for sake of clarity.
In case of a ‘comma delimited’ text file to be transformed in a corresponding ‘space delimited’ file I’m using something like this (assuming no other relevant commas exists in the file content):
set string=%, for /f "tokens=*" %%a in (a.txt) do ( set row=%%a set row=!row:%string%= ! echo.!row!>>b.txt )
But what about the case of a ‘tab delimited’ destination file? Should I refer somehow to the ASCII code of tabs? Any (alternative) solutions ? And how about taking into account that consecutive delimiters should be treated as a single one (e.g. two consecutive spaces replaced by a single tab)? Thanks
max
PS to ghostdog: I’m also trying to work out a solution in gawk but I’m still learning it (not yet confident but I’m practising); sorry about that but I’m so slow…
What's that % doing before the , ? In any case, why don't you just use the comma directly as in
set row=!row:,= !
"But what about the case of a 'tab delimited' destination file?"
What is so special about that? Why not treat it the same way as you do the space-separated file. Just use a tab instead of a space in your set command.
I also have a similar question. How would I use a space delimeter in a variable?
set /p address=Enter you address: echo. >> %address%.txt
This will only diplay the first string in the filename.
EDIT: Got it to work. echo. >> "%address%".txt
Then if you want it in a different directory, first cd to the directory of the folder "%address%".txt is located. copy "%address%".txt "your\new\location"
i would still suggest you save your your time and learn some real language/tools. Not just gawk, Perl/Python etc ,these are all better in file and string manipulations than batch!
to klint you are right the % in the set command it's not necessary (at least it seems so to me); I think the % just slipped in somehow (a refuse?) but I tested the script and the % placed there is not harmful (?)
to ghostdog you are also right and as I said you before I'm trying to improve my knowledge on proper programming languages (python and gawk as you suggested) but it's a long run... As I already said here in this forum I'm hanging on so much on batch scripting because sometime I'm on the need to distribute "small tools" at click-hand easy to use also to non-programmers, can you imagine someone less than me?
to maxbre. it doesn't mean you can't do "click hand easy to use" using other languages like Perl/Python etc. if you are proficient in them, making up a script is as easy as breeze. Anyway, its up to you.
Here the batch to do the job. It handles files as outlined in post #1 with n tokens per line. Tab is a hot key, handled directly by the command interpreter, so it can't be deactivated by a caret as other special symbols; it must be encapsulated in a file like radioactive materials.
:: spc2tab.bat @echo off setLocal EnableDelayedExpansion
:: [generate a temp file holding one token/tab per line]
type nul > file_tab.tmp for /f "delims=" %%i in (file_spc.txt) do ( for %%j in (%%i) do ( echo.%%j>> file_tab.tmp type tab.tmp>> file_tab.tmp ) )
:: [detect the number of tokens+tabs per lime]
set /p row=< file_spc.txt set tkn=0 for %%i in (%row%) do set /A tkn+=2 set row= set cnt=0
:: [create the target file with tab separators]
type nul > file_tab.txt for /f "delims=" %%j in (file_tab.tmp) do ( set /A cnt+=1 set row=!row!%%j if !cnt! equ !tkn! ( echo.!row!>> file_tab.txt set row= set cnt=0 ) )
del *.tmp :: end_of_batch
While amazing that is much like to explain the behavior of solid state devices without Quantum Physics: you can do it but you do not go far away in that way. Better to code in Perl/Python or just gawk.
wow ivo ! (and also all other guys!) I know you are 'the great master' of batch scripting but I could not imagine you can handle 'radioactive matter' as easily as a piece of cake! As usual you (along with all other friends in this forum) are feeding me with loads of interesting topics to study (enough for next 10.000 years just by assuming an incredibly fast decay rate!) Thank you all again, god bless this forum! max
I am confused by your post. What exactly is so special about the tab character? Are you talking about filename completion? That's only relevant when you are actually typing at the keyboard on the command line. In other cases, it is treated as just any ordinary character. So no need to do rocket science.
I agree with you about rocket science that is indeed a plague on work as it widely fascinates especially dumb people, but here is not the case.
In the batch below try to replace the # char with the one generated by the TAB key: all my editors expand it to N spaces not the ASCII hex 09. So the need for some exoteric code if you do want THAT control char (ASCII hex 00 - 1F).
@echo off & setlocal EnableDelayedExpansion for /f "delims=" %%j in (in.txt) do ( set row=%%j set row=!row: =#! ------ ------ If you can give me a conventional coding way I am glad to learn.
Oh I see what you mean. That's not a CMD problem, it's just a problem with your text editor. I'm sure you can find one that doesn't expand tabs to spaces. For example, Notepad. Even if your editor expands tabs, you may be able to change that behaviour in your Options menu. On my editor, I use a file-specific setting to expand tabs to spaces when writing .java, .c, .cpp files but not when writing .txt, .bat, .cmd files.
Now here's that line from your code, which I've changed using Notepad:
____________________________________________________________ C:\>type t.bat set row=!row: = !
C:\>xxd -g1 t.bat 0000000: 73 65 74 20 72 6f 77 3d 21 72 6f 77 3a 20 3d 09 set row=!row: =. 0000010: 21 0d 0a !.. ____________________________________________________________
The hex dump proves it's a tab (09) and not a space.
well, now you astronauts would you please come back to earth and explain me what are you talking about? Please...
to ivo is it the same mismatch I had about ‘what you see is not what you get’? and why and how this esoteric piece of code echo.e 0100 09 > dbg_cmd.tmp echo.w >> dbg_cmd.tmp echo.q >> dbg_cmd.tmp debug tab.tmp < dbg_cmd.tmp > nul is producing a tab? What is the debug command about?
to klint does it means that to do the replacement of a space with a tab I just need to hit the corresponding keys in the following command line: set row=!row:(space)=(tab)! obviously bearing in mind that (space) and (tab) are just here put to avoid confusion about ‘what you get but - sometime - you don’t see it’?
thanks again for your valuable help, this is going to be a sort of university for me
here the code to handle your job. Any way the exoteric one works fine and is not harmful at all. This replaces one or more spaces with just one tab delimiter. Replace the # by pressing the TAB and you see in Notepad a wide space to blow up: it is the "on the screen" view of the hex 09 ASCII TAB char.
Later I'll explain the cryptic code of debug.
@echo off & setlocal EnableDelayedExpansion type nul > file_tab.txt (set TAB=#) for /f "delims=" %%i in (file_spc.txt) do ( set row= for %%j in (%%i) do (set row=!row!%%j!TAB!) echo.!row!>> file_tab.txt ) :: end_of_batch
first of all please review the previous post as I deeply edited the script there to enable the handling of multiple spaces between tokens.
Now let us walk through the "exoteric" code ending with the "debug" command.
Debug is a legacy command related to the dawn of DOS and the age of x86 processors (I mean 8086/8088 and the '80s). It acts as a raw assembly/disassembly line command interactive tool to create/test tiny pieces of machine code. It is aimed to debug the native .com executables now superseded by .exe either 16 or 32 bit oriented.
Still part of the DOS subsystem it may be useful in generating/modifying hex code. Here it is used to replace the # char with TAB inside the three bytes file tab.tmp (23 0D 0A).
debug tab.tmp < dbg_cmd.tmp > nul
instructs debug to load into memory tab.tmp and to accept commands from the file dbg_cmd.tmp (previously echoed) suppressing the output messages (> nul).
dbg_cmd.tmp stores three orders
- e 0100 09 replaces (e means enter) the car # with TAB as debug loads the selected file to the segment offset 0100 (take that for granted as the layout of .com executable is far beyond this note)
- w (re)writes the tab.tmp now storing 09 0D 0A
- q quits debug
While cumbersome and unreadable, debug can solve *impossible* tasks by generating code out of the blue.
I hope the lesson was not too boring and what learned by the exoteric script may help in future challenges.
hi ivo the 'exoteric bat' is working fine but the 'simple one' seems to produce a double tab between each token; this is not because I want to bore you too much again with trivial tiny things but just because it's time to let you know how much I appreciate your efforts and that I'm carefully studing all your lessons... thank you again max
are you sure to copy/paste the revised version of the script in post #17? As I edited that hours later than I posted it the first time (see the incipit of post #18).
I tested it with a scattered matrix and the result was a perfectly formatted table. Pay attention too in replacing the # symbol with the TAB as the blowing up spaces have to be seven chars: the statement is embraced by parenthesis to allow a loose control of that. I checked the input and output files by a hex dump utility to verify the space/TAB replacement and all worked fine.
It is a pity not to use that version as it is more efficient in term of disk space and time wasted.
hi ivo it' my fault you are right, as usual! I mismatched final result because the scritp is indeed producing a correct tab delimitation of fields but also on the last one (i.e. adding a final tab after last field) and that is 'disturbing' the import in some applications... How the embracement of statement set with parentesis is allowing a sort control? thanks again max
the batch below is a slightly modified version to avoid the trailer tab delimiter.
About the parenthesis it allows a little control as
(set TAB=#)
bacomes after the # replacement with TAB
(set TAB=*******)
where * means <space> and the right parenthesis is pushed right by TAB generated spaces (in the view).
@echo off & setlocal EnableDelayedExpansion type nul > file_tab.txt (set TAB=#) for /f "delims=" %%i in (file_spc.txt) do ( set row= for %%j in (%%i) do (set row=!row!%%j!TAB!) set row=!row:~0,-1! echo.!row!>> file_tab.txt ) :: end_of_batch
The information on Computing.Net is the opinions of its users. Such
opinions may not be accurate and they are to be used at your own risk.
Computing.Net cannot verify the validity of the statements made on this site. Computing.Net and Computing.Net, LLC hereby disclaim all responsibility and liability for the content of Computing.Net and its accuracy.
PLEASE READ THE FULL DISCLAIMER AND LEGAL TERMS BY CLICKING HERE