Having only learned Python a couple months prior after teaching it to myself over the summer, I was itching for an excuse to apply it to a project. Furthermore, I had recently stumbled upon a few articles detailing natural language processing techniques using Python's NLTK and I wanted to take my own shot at it. From these ambitions, FreeSpeak was born.
FreeSpeak is a Python Flask app that uses natural language processing techniques to recognize meaning in semi-unstructured English sentences and then convert them down to x86 assembly instructions. FreeSpeak uses caching so that the more tasks asked of it, the "smarter" it will get at recognizing commands.
While the initial goal of the project was to be able to handle completely unstructured language, I quickly realized that this problem was a bit more complex than I had given it credit. However, it wasn't a complete failure.
FreeSpeak starts off knowing a few key words that have associated meaning to them. When a sentence is submitted to it, it goes through each word and checks to see if it matches a known, labeled keyword. If it doesn't, it uses a thesaurus API to check and see if it matches a synonym for a known keyword. If it does, then it is added to a list of known synonyms, which allows it to process faster in the future (hence the quotations in "smarter").
After every word has been labeled, a sort of state machine like process is used to determine context. Different categories of words such as "type", "structure", and "task" are used as keys for ordering so that commands can be extrapolated. In this way, it is actually similar to a more traditional lexer in that it uses look aheads and ordering to determine whether certain rules should apply.
After context has been derived (which just means that a task keyword was found and has organized, associated parameters), it is just a matter of handling the task and converting down to x86 Assembly.
I created FreeSpeak before ever taking a Programming Languages class, so when I did take one the next semester, I was actually surprised to realize the similarities between what I had done and what common techniques were used for lexing and parsing mainstream programming languages. One thing that I realized was that I had essentially created a form of lexer that I called a "categorizer" that then worked in part with the "task handler" to provide "context" to it, which really meant translating it to a sort of bytecode.
I tried to hardcode as little as possible with this project, so keywords are actually really fluid and can be added to simple wordlists. This is because I figured that the meaning of the command wasn't as important as identifying all parts of the command, so a keyword's (or a synonym of the keyword's) classification actually proved to be more important.
In the very loosest sense of the term, FreeSpeak could even be considered almost esoterically Turing Complete. It is possible to store data in the form of variables and other data structures, there are several different ways to use flow of control, there are keywords (this, self, etc.) that are aware of scope, and there are several dynamically assumed or explicitly stated types available. However, I would hesitate to call it this as in its current state, it is prone to many bugs that weren't able to be ironed out due to time constraints and there are many more features that I originally wanted to add.