In short, I will dismiss the notion that float provides a realistic advantage in storage of large numbers. (Also, I had a blog title idea too catchy to pass on.)
I'll avoid recapping in total my earlier post about the dangerous consequences of rounding with the float data type. The float data type is still popular with developers and database designers because of its supposed advantages in data storage size. (Also, some code-first modelers use float as a default. Also dangerous. And lazy.)
Too often, developers and database designers ignore the large potential for inconsistent rounding with float (and real) data types, whereas a decimal(p,s) data type is not only consistent with storage, but it provides transparent logical intent to the development team. No surprises with truncation of numerals to the right of the decimal point with the decimal data type.
Here's my test lab for a number with 3 digits to the right of the decimal point.
First, the resultsets showing the storage of 2+ million rows in a single-column table.
Here's the values that were actually stored, notice the important differences.
create table realtest (real1 float(24)) create table floattest (float53 float)--float(53) is the default create table decimal9test (decimal9 decimal(9,3)) create table decimal19test (decimal19 decimal(19,3)) go insert into realtest (real1) Values (123456.123) insert into floattest (float53) Values (123456.123) insert into decimal9test (decimal9) Values (123456.123) insert into decimal19test (decimal19) Values (123456.123) go insert into realtest (real1) select real1 -1. from realtest insert into floattest (float53) select float53 -1. from floattest insert into decimal9test (decimal9) select decimal9 -1. from decimal9test insert into decimal19test (decimal19) select decimal19 -1. from decimal19test go 21 go select sizemb = SUM(sizemb), tablename from ( select SizeMb = (p.in_row_reserved_page_count*8.)/1024., tablename = o.name, indexname = i.name from sys.dm_db_partition_stats p inner join sys.objects o on p.object_id = o.object_id inner join sys.indexes i on i.object_id = o.object_id and i.index_id = p.index_id where o.is_ms_shipped = 0 ) x where tablename in ('floattest', 'realtest', 'decimal9test', 'decimal19test') group by tablename order by sizemb desc go select top 1 * from realtest select top 1 * from floattest select top 1 * from decimal9test select top 1 * from decimal19test go
tablename | row_count | sizemb |
---|---|---|
realtest | 2097152 | 26.382812 |
floattest | 2097152 | 34.445312 |
decimal9test | 2097152 | 28.382812 |
decimal19test | 2097152 | 36.507812 |
Here's the values that were actually stored, notice the important differences.
real1
123456.1
float53
123456.123
decimal9
123456.123
decimal19
123456.123
You can see that the ~10mb we saved with the real (or float(24)) data type isn't much good because of the catastrophic loss of precision. And even with 2 million rows, we've saved roughly 2mb of space by using float.
Now, let's run the same test, instead with the value 123456.1234567 instead. (The Decimal9 table has been eliminated from this result because it can only store a number with scale 9.)
create table realtest (real1 float(24) ) create table floattest(float53 float) --float(53) is the default create table decimal19test(decimal19 decimal(19,7) ) ...
tablename | row_count | sizemb |
---|---|---|
realtest | 2097152 | 26.382812 |
floattest | 2097152 | 34.445312 |
decimal19test | 2097152 | 36.507812 |
real1
123456.1
float53
123456.1234567
decimal19
123456.1234567
Note again that real is wholly unacceptable with its transparent loss of precision, while float and a properly-aligned decimal data type store the data appropriately and yet without a significant difference in storage (less than 6% difference).
Now, to slam the door on float.
Let's apply DATA_COMPRESSION = PAGE to this data.
create table realtest (real1 float(24) ) with (data_compression = page) create table floattest(float53 float) with (data_compression = page) create table decimal19test(decimal19 decimal(19,7) )with (data_compression = page) ...
tablename | row_count | sizemb |
---|---|---|
floattest | 2097152 | 26.382812 |
decimal19test | 2097152 | 24.320312 |
We can clearly that page compression reverses the small advantage that floattest enjoyed.
Since page data compression is a very popular, very widespread and very useful tool on modern Enterprise edition SQL Servers, we can rule out float as having any space saving advantage. In short, float loses the last ground it stood on.
The real and float data type columns only compressed by 15% and 23% respectively. The fixed decimal data type compressed, in this case, by 43.5% (36.5mb to 24.3mb).
If we scale the page compressed data comparison up to 134 million rows...
...we confirm that floattest has lost its so-called space savings advantage, and is more than 7% larger than the same data in a proper decimal data type.
Since page data compression is a very popular, very widespread and very useful tool on modern Enterprise edition SQL Servers, we can rule out float as having any space saving advantage. In short, float loses the last ground it stood on.
The real and float data type columns only compressed by 15% and 23% respectively. The fixed decimal data type compressed, in this case, by 43.5% (36.5mb to 24.3mb).
If we scale the page compressed data comparison up to 134 million rows...
tablename
|
row_count
|
sizemb
|
---|---|---|
floattest
|
134217728
|
1686.070312
|
decimal19test
|
134217728
|
1556.015625
|
...we confirm that floattest has lost its so-called space savings advantage, and is more than 7% larger than the same data in a proper decimal data type.
2 comments:
Has any testing been done to determine the relative query performance of say SUM, AVG, STDEV etc for DECIMAL v FLOAT?
CPU’s have hardware support for operations on REAL/FLOAT types but DECIMAL would need at least some degree of software implementation (possibly slower).
More generally, this discussion can be simplified to a horses-for-courses argument. Each of these types have appropriate use cases.
Great analysis! I didn't know decimal compresses more than float, but I would suspect that it may depend on what specific numbers or distribution of numbers you are storing. I had to chuckle about your caveat about 'appropriately sized' and how we had to quickly throw out real for not storing large or high-precision data. I always thought the value of float was that you don't have to accurately predict what size data you will need to store. Sometime requirements change over time and if you incorrectly size a decimal it can also cause 'catastrophic' problems. Sometimes people use float, not because they are lazy, but because they are planning for the future growth. For any number you can store in decimal, there is a larger number you can store in a float.
Post a Comment