Alternate sed substitute demarcation characters

Usually you are taught to use the / character when you are marking off the from and to part of a substitution operator in sed, perl or another language that uses this function. But you may not know that you can actually use any character. One of the most common times you'll see this is when someone wants to do a replacement on a file path.

echo /bin/ls | sed 's|/bin|/usr/local/bin|'

In the example above, the pipe character (|) is used in place of the forward slash (/) for demarcating the search and replacement parts of the 's' command. Other characters that are commonly used include the comma (,) and the number sign (#). This is a handy feature when either your search or replacement data has the / in it and you want to use something else. But you can also use almost any other character including letters and numbers.

sed statement<<<dated

In the above command, the letter t acts as the demarcation character. In order for this to work the expression must start with an s and have exactly 3 of the demarcation character including one of them being at the end. So the pattern is sN[^N]*N[^N]*N where N is a character. So I was curious what other words could work as expressions to generate another word. In order to do this I would need to use two commands. The first would be used to find English words that could be used as substitute expressions.

look s | egrep --color=no "^s(.).*\1.*\1$" | grep -v "'"

The look command searches the dictionary file for words starting with the letter 's'. Then the grep expression saves the 2nd character as a back reference that can be referenced using the \1 sequence. First using it to state that we want a group of characters not made up of the saved character followed by the saved character and hten that sequence again followed by the end of the string. This produces the following words

sanatoria
sanitaria
sarcomata
sarsaparilla
savanna
secede
secrete
secretive
segregate
selective
selvedge
semipermeable
sentence
sentience
sentimentalize
sequence
serenade
serene
serpentine
serviceable
serviette
severance
severe
sewerage
stateliest
statement
stealthiest
stoutest
straightest
straightjacket
straitjacket
strategist
streetlight
stretchiest
strictest
structuralist

Some of the words have more than 4 of the 2nd character in the word because the expression is greedy and .* matches that character. Unfortunately, doing something like s(.)[^\1]+ doesn't work and becomes "not the literal number 1, 1 or more times".

Not all of which can be used to generate other English words, but is interesting none the less. Now with that set of words we need to run each of them and see if it generates another word that look can find. There are indeed many ways of doing this. I first tried the monkey way:

for expression in $( look s | egrep --color=no "^s(.)[^\1]+\1[^\1]+\1$" | grep -v "'" ) ; do subsection=$( sed "s/^s\(.\)\(.*\)\1.*\1$/\2/" <<<$expression ) ; for word in $( look . | grep --color=no $subsection ) ; do newword=$( sed $expression<<<$word ) ; grep -q "^${newword}$" words && echo "sed $expression<<<$word -> $newword" ; done ; done | tee sed-word-sub-word-combos.txt

This is rather complex and really isn't necessary. There isn't a reason why you can't just run sed on each of the words and see if it generates something else.

for expression in $( look s | egrep --color=no "^s(.)[^\1]+\1[^\1]+\1$" | grep -v "'" ) ; do for word in $( look . | egrep "^[a-z]+$" ) ; do result=$( sed $expression<<<$word ) ; look . | grep -q "^$result$" > /dev/null && [[ "$word" != "$result" ]] && echo "sed $expression<<<$word -> $result" ; done ; done

Basically, for each word that is a valid sed substitute expression, loop over the list of English words that match ^[a-z]+$ and grep for the result in that list. You can't do a "look $result" because that can match a word that begins with the result even if the result is not a word itself.

See this file for the results

The command can take a long time to run so I'd suggest just looking at the page linked to above for the results. Here are a few of the interesting combos where all three words are kinda related:

You could also do this with other sed commands like the y/// command, which translates characters from part1 to those in part2. However there aren't any words that start with y and have the yN.*N.*N pattern. Finding other examples is an exercise left to the reader.

climagic home page

Created: 2012-06-02

blog comments powered by Disqus