Solved How to identify a pipe character in a pipe delimited file

December 9, 2014 at 11:31:26
Specs: unix
How do I identify a pipe character within a pipe delimited file in unix and change it so the validations will not error out?

See More: How to identify a pipe character in a pipe delimited file

Report •

✔ Best Answer
December 11, 2014 at 21:52:27
Sorry for the late post. My real job was getting in the way.

This perl script expects each line to have one URL with the beginning starting with http:// and ending with the last pipe symbol. It splits the line on http:// into a two element array. With the second array element, the regular expression changes each pipe symbol into an / except the last one. If you want a different character than / change it in the regular expression. Keep in mind that the pipe symbol and the front slash are special characters and they are escaped.

Finally, print out array element one, the split element, array element two and a new line.

#!/usr/local/bin/perl

use strict;

my $in_file = 'datafile.txt';
open(IN_FH, $in_file ) or die "Could not open file $in_file ";
my $mysplit = 'http://';
while ( <IN_FH> )
   {
   chomp;
   # split each line into a two element array
   my @values = split($mysplit, $_);
   # this regex changes the second element so each | is an / up until the
   # last pipe
   $values[1] =~ s/\|(?=.*\|)/\//g;
   # print out each line
   print "$values[0]$mysplit$values[1]\n";
   }
close(IN_FH);

message edited by nails



#1
December 9, 2014 at 12:51:27
It depends on the situation. Do you have an algorithm for determining which is the pipe that needs to be changed?

For example, if you have a file where the second pipe symbol has to be changed to an ampersand character:

this is column 1|and this | is column2 with a pipe|and this is column 3|

This command on Solaris:

sed 's/|/\&/2' datafile.txt

provides this output:

this is column 1|and this & is column2 with a pipe|and this is column 3|



Report •

#2
December 9, 2014 at 12:58:23
The issue is when we are parsing the file, which is pipe delimited. There is a field that has a URL in it that has pipes. We need to change the pipe in the URL to a '/'. How do you identify the | within a pipe delimited field?

Report •

#3
December 9, 2014 at 13:10:51
How about showing example data?

Report •

Related Solutions

#4
December 9, 2014 at 13:44:08
First|Last|URL|INID
David|Smith|http://www.computing.net/answers|nix/|123456

the issue is the | in the URL field between answer and nix. In this example there is one | but there could be more then one | like ... http://www.computing.net|answers|nix

If you need more let me know.


Report •

#5
December 9, 2014 at 14:12:40
I still have not identified an algorithm that can be implemented such as where the URL ends.

Can you identify the end of the URL? Such as does it always end in nix/? Or does the URL always end with /

Is the number of colums from the end of the URL to the end of the line always equal? Your example line contains only 1.

message edited by nails


Report •

#6
December 9, 2014 at 14:53:09
That is the problem. The | could be any where in the URL field and could be there more then once. In this particular feed, there are more fields then I showed, I just showed the field that I am having the error in. But the number of fields in the entire file are always in the same positions. I the real feed the URL field is the eight field.

Report •

#7
December 9, 2014 at 20:55:28
I am assuming the URL always starts with http:// If you can't tell me what the URL ends with, I don't see how I can help you

You might try posting some real data. Maybe I can see a pattern.


Report •

#8
December 10, 2014 at 10:30:42
The URL is surely everything from the "http" up to (but not including) the final "|" in the line.

Report •

#9
December 10, 2014 at 10:56:36
Correct. That is right.

Report •

#10
December 11, 2014 at 21:52:27
✔ Best Answer
Sorry for the late post. My real job was getting in the way.

This perl script expects each line to have one URL with the beginning starting with http:// and ending with the last pipe symbol. It splits the line on http:// into a two element array. With the second array element, the regular expression changes each pipe symbol into an / except the last one. If you want a different character than / change it in the regular expression. Keep in mind that the pipe symbol and the front slash are special characters and they are escaped.

Finally, print out array element one, the split element, array element two and a new line.

#!/usr/local/bin/perl

use strict;

my $in_file = 'datafile.txt';
open(IN_FH, $in_file ) or die "Could not open file $in_file ";
my $mysplit = 'http://';
while ( <IN_FH> )
   {
   chomp;
   # split each line into a two element array
   my @values = split($mysplit, $_);
   # this regex changes the second element so each | is an / up until the
   # last pipe
   $values[1] =~ s/\|(?=.*\|)/\//g;
   # print out each line
   print "$values[0]$mysplit$values[1]\n";
   }
close(IN_FH);

message edited by nails


Report •

Ask Question