DB data type — Varchar vs Text dilemma
On a quick look, it feels like varchar(n) is better, because it feels like we are saving disk space by specifying exactly what we are going to use. Is it really worth it ? Why can’t we keep it simple and call it text
data type ?
Postgres Character types
More details info — postgres docs
Mysql Character types
More details info — mysql docs
character(n) is slow
Let’s rule out character(n) unless the problem we are trying to solve some rare case scenario. Based on the postgres document,character(n)
has no performance advantages in PostgreSQL In fact character(n)
is usually the slowest of the three because of its additional storage costs.
varchar(n) is slow
varchar(n) are always problematic when dealing with resizing at a later stage. varchar(n) also has an overhead while writing data — each time, database checks whether data is within limits.
Does varchar(n) takes less storage than text
Mysql docs clearly calls this out.,
L
represents the actual length in bytes of a given string value.
- Text ->
L
+ 2 bytes, whereL
< 2^16 - varchar ->
L
+ 1 bytes if column values require 0 − 255 bytes,L
+ 2 bytes if values may require more than 255 bytes
When to use varchar
- If the field needs a database level validation on the string length., let’s say social security number
OLAP databases
Clickhouse, an OLAP database, for instance, ignores VARCHAR kind of data type declaration and behind the scenes handles them as String.
The String type replaces the types VARCHAR, BLOB, CLOB, and others from other DBMSs.
When creating tables, numeric parameters for string fields can be set (e.g. VARCHAR(255)
), but ClickHouse ignores them.
Conclusion
Using text by default and using varchar in places that really need them simplifies the implementation. Overthinking each field how long it should be and when it hits the limit increasing the size feels over engineering if the advantage of doing such a limit doesn’t have huge page off.
Do you have a different perspective ? Let me know in the comments.