I was reading a book on Domain Specific Languages on my Safari Online subscription on my iPad. I was trying to learn more about them and how to parse.
I read a big chunk of it and I have to say it is a quick and enjoyable read. But I am not sure if I got much out of it. He suggests making a semantic model as the result of parsing. That way, one separates parsing from action. This sounds good. For some of my languages of interest, I think this will make a lot of sense. But for others, I am generating documents or code and as the author mentions, the semantic model is generally the parse tree itself.
The other big thing I got from the book is that a lot of the standard parsing techniques have issues and can be complicated. This suggests to me that the Pratt parsing that I am pursuing might be a good idea. Apparently regular languages (regex) can’t handle nesting. Context-free languages (handles nesting) that most parsers deal with seem to have issues with variable assignments (bizarre to me) and that context-specific languages do not have generalized parsing. Also, mixing in different languages together as I want to do is not supported in general. So all of that suggests that what I am learning and doing is not totally stupid.
One little side note from my reading were the two languages graphviz and scss. Graphviz is a simple language for constructing visualization of graphs. That sounds like a nice tool to have. There are also pstricks packages that can do this too. It might be fun to compare.
A similar project that I am inspired to pursue is an html markup variant, which I am currently calling htmPlus; I leave off the L in html to indicate that it is like html but with a little less noise. My current idea is to create a syntax with as minimal extra as necessary but still have all the tags of html. I do not like the wiki syntax I see since it doesn’t really teach html at all. Using # for a numbered item or *bolded* does not get to what element is being used which prevents being able to read html, write css, or do jQuery selectors. So I want a language which is very close. But the idea of typing out
is painful (how do GUIs deal with attributes?). Instead, my idea is \div.contthis#me some stuff // Same content, but less noise. The closing “bracket” could also be //div or //.contthis or //#me and it would close up all tags in between. So that can minimize useless closures.
I am torn about paragraphs. They should be \p sentences…… But I really like using two newlines to create a paragraph even though that leads to a misleading experience in coding html. I guess for classes and ids, \p would be needed such as in \p#nifty for
Also in there would be a language for doing some useful programmatic replacement. I noticed when trying to write up a page on a recent diet that I have had, that I wanted to write up recipes with ingredients, directions, calories, fiber, maybe even cost of ingredients, etc. And you can well imagine that you might want the ingredient nutrients to bubble up from their own level and that you also will be formatting them in a certain way. So you have to decide on the markup structure and then put in the data. Well, if you decide to change the structure, that is hard. And there is lesss noise when reading simple data structures. Some would solve this with a server-side script serving up data and using templates there, but I think it would be nice to be able to write up pages as stand-alone items. Thus, htmPlus fills the gap. Write out the data and templates in the markup and then translate it into a proper html page. And, of course, if one decides to grab the data and put it elsewhere, then it should be easy to refactor it since the data will be in a data structure (JSON, to be specific).
The tags of htmPlus will look superficially like TeX with all of the slashes. But the similarity would end there. And I am fine with that. To mix in other languages, I would use something like this: \m math stuff // or \js script code // but the script code would not be in the html page, but used in generating it using my own custom js parsing (for security). To do scripting in the page itself, one would use \script code // though again // would be a problem with comments. I am thinking of a comment structure of “comments” that I could use in these various languages. Not sure, but I want stuff that I prefer non-shift needing keys.
Just some random thoughts from today.