The Data Studio

Words You Should Never Use

This page is inspired by the wonderful Plain English Campaign, especially their A-Z of Alternative Words.

Bad Word What You Mean Reason
Best Practice because + <a damned-good reason> "Best Practice" was a good idea in the first place, but it has been corrupted. What "Best Practice" means now is a set of assertions that allow your boss, with a self-satisfied air, to say, "we follow Best Practice". Because he wants to do this, the vendors publish "Best Practice" assertions. The term "Best Practice" closes down discussion. If I say "why do you do that?", the answer is "it's Best Practice". If I say "who says so?", the answer is "everybody knows it's best practice", with the implication being "what kind of consultant are you if you don't know that?". That's when I need <a damned-good reason>.
Anti-Pattern Your solution is bad "Anti-Pattern", like "Best Practice", was a good idea in the first place, but it has been corrupted. What "Anti-Pattern" means now is a set of assertions that allow your Enterprise Architect, with a self-satisfied air, to say that your solution is an, "Anti-Pattern". The term "Anti-Pattern" closes down discussion. If I say "what, exactly, is the problem with my solution?", the answer is "it's an Anti-Pattern". If I say "who says so?", the answer is "everybody knows it's an Anti-Pattern", with the implication being "what kind of consultant are you if you don't know that?". I need to defend my solution, and that's OK, but just calling it an anti-pattern does not convince me.
ingest load Your database does not eat your data; it stores your data safely so that you can see it whenever you want to. Your dog might "ingest" a sock, or a gold ring. In the first case you would hope that the sock didn't kill the dog; in the second you would have to endure a couple of days of unpleasantness to get the gold ring back. You don't want to go through that kind of unpleasantness to get your data back. (Although, with Hive it's an appropriate analogy.) In cases where "ingest" is the appropriate word you should say "eat" anyway.
canonical because I say so This is another word like "best practice"; it is used to close down discussion. It suggests that the "canonical" version is the one true version and we must follow it. I want to know who says so, and why.
consume read and delete "Consume" is often used to describe the processing of a file, in which the file is read and then deleted by a program. (The file may be a message but probably comes to rest on some storage device at some point.) This is a processing model that assumes complete isolation of the program doing the consuming, rather than a model in which the file is shared, being used in several programs. That is OK if it reflects the actual requirement, but often the assumption is made without any regard to actual requirements. Sometimes "consume" is used more loosely and the file is read but not deleted. Just don't use the word consume. I want to be specific about how the file is required to be used, so read and delete are better words to use.
serialize copy To serialise means to copy the state of an object, that is the values of its properties, to some persistent storage. It is a pretentious way of saying "copy" and not really applicable to the bulk copying of files, so do not say "serialise" in this context.
deserialize copy To deserialise means to copy the state of an object, that is the values of its properties, from some persistent storage. It is a pretentious way of saying "copy" and not really applicable to the bulk copying of files, so do not say "deserialise" in this context.
serde copy This means "serialise/deserialise". "Serde" is even uglier than that mouthful. But in the Hadoop world "serde" is used as a noun to describe a process that interprets a file format and copies the data from the file to a table. I think it is the worst case of a gratuitous abbreviation that I have ever heard, and there are plenty of bad ones out there. Use short simple words instead.
temporal accessor date format Processing dates is messy. There are some nice functions/classes to format dates, extract parts of a date, compare dates, convert them from one form to another. The Java documentation on the TemporalAccessor interface says "This interface is a framework-level interface that should not be widely used in application code". That is good advice, but I would change "not be widely used" to "never used".
persist, persistence save Persistence suggests that the system is just looking after some data for an application temporarily. In some systems this is appropriate, but in most commercial and government applications the data needs to be saved permanently and it needs to be shared by many applications. The data is the centre; the applications are satellites around the data. Persistence sends the wrong message. It is also a long and fancy word where a short word is better.
tuple row If you are a mathematician you can say tuple (if you like). If you are an application developer say "row".
relation table If you are a mathematician you can say relation (if you like). If you are an application developer say "table", unless you are talking about your auntie.
schema-on-read

Don't say anything. There is no such thing as schema-on-read. Either you have defined a schema, before your wrote the data to the database, or you didn't define a schema.

Not having a schema may be sensible if your data just consists of streams of text, but this is rare. Schema-on-read is a sound-bite used to sell to Dumas the Gullible and his peers. He thinks that designing a schema is a pain. He wants to hear that he can avoid this pain, so the vendor tells him he can with "schema-on-read". Schema-on-read is a nonsense for most commercial data.

going forward Just don't say anything. If you have a plan, it relates to the future. Might you be planning to change the past? If it doesn't make sense to say "going backwards" in what you are saying, then don't bother to say "going forwards"; it is redundant, just noise.
semantic exception say what you mean "Semantic exception" means an error of meaning. What kind of pretentious language is that? It's recursive pretentious language because it means "say what you mean".
risk-based Can we get away with it? A UK company I worked for had a "risk-based approach" to Data Protection legislation. They broke the law, but they knew that the enforcement was weak and the penalties small, so they carried on breaking the law and got quite angry with anyone who suggested that they should stop.
two-dot-oh (2.0) I have had a complete creativity failure This is another one that may have been OK the first time it was used to mean "the second major release, still quite fresh but with fewer bugs and a better-thought-out architecture". Now it tells you that the speaker or writer has no imagination. One bad example was the pot-boiler from Bill Inmon and friends: "DW2.0". This title also breaks the "no gratuitous abbreviations" rule. An even worse example is a proposal from some UK Members of Parliament for Common Market 2.0 which is not a new idea and which satisfies almost no-one.
reach out send an email What this really means is "I'm trying to sell you something but I'm going to pretend that I really care about you and your problems and your needs". When you hear someone say "reach out", your insincerity alarm should instantly wake up the neighborhood. However, as it says, here you do have an exemption if you are a member of The Four Tops.