Home

 › 

Articles

 › 

Elasticsearch Data Types

types of cryptography

Elasticsearch Data Types

Today, we’re giving you a crash course on Elasticsearch data types. Built on Apache Lucene, Elasticsearch is an open-source solution that provides developers with a search and analytics engine capable of working with a variety of data types. 

Elasticsearch defines several data types: objects, text, numbers, geospatial, structured, aggregate, and document ranking.

If you don’t understand what those terms encompass, you’ve come to the right place. In this article, we’ll break down what some of these types of data are and give you some examples so you can build on your foundational knowledge of Elasticsearch. Once you have this foundation, you’ll be able to build better search and analytics algorithms for all your projects.

Structured vs. Unstructured Data in Elasticsearch

To really understand the different data types in Elasticsearch, you need to grasp the differences between structured and unstructured data. Structured data is really quite simple—it’s data that can be organized in a specific way using a certain format. 

One great example of this that many developers can understand is a relational database. If you have a database set up to keep track of employee data, as a simple example, you might have an ID field that stores an integer, a first name field that stores a string, and a last name field that stores another string. This is structured because it has to follow the rules specified by your relational database.

Unstructured data, on the other hand, doesn’t follow many rules. Let’s say you’re working at a company with a document server that stores PDFs or text files of a variety of information. Since it’s not all the same type of document, these are considered unstructured. However, you still need to be able to search through this data.

Elasticsearch comes in and allows your company to index this unstructured data in a structured format, making searches through this server much easier, faster, and more efficient. 

It does this by mapping the unstructured data in an inverted index, essentially turning your server into its own search engine. It does so by mapping your data through its textual, numerical, and geospatial data types, which you’ll learn more about throughout the rest of this article.

Textual Data Types in Elasticsearch

TypeDescription
textstandard field for full text
match_only_textspace-optimized version of text
annotated-texttext that contains special markup
completionused for auto-complete features
search_as_you_typeoffers as-you-type completion
token_countcounts the number of tokens in a text

One of the most commonly used textual data types in Elasticsearch is the text type because its usefulness lies at the foundation of what search algorithms are used for—put simply, to search for text. It parses the content of each text data type through an analyzer that converts each string to a list of separate terms. This data type might be useful for the following types of content:

  • Blog posts or articles
  • Email content
  • Descriptions of products on a webpage
  • Electronic books

Essentially, Elasticsearch can be set up to search through any number of long or short strings of text. This feature might be used as a web search engine, a find/replace feature in word processing software, or a search feature in an email client. Without this capability, it would be much harder to parse through large collections of information.

The match_only_text type performs similarly to text, with some slight differences. One of the main reasons why this text type exists is it allows an algorithm to save resources that might be better used elsewhere. However, it does come with some downsides:

  • Even though it can perform term queries faster, other queries often perform slower because it has to look at the source document before returning its results.
  • The analysis can not be configured with an analyzer; you always have to use the default.
  • There is no support for span queries.

Numerical Data Types in Elasticsearch

TypeDescription
longsigned 64-bit integer
integersigned 32-bit integer
shortsigned 16-bit integer
bytesigned 8-bit integer
doubledouble-precision 64-bit floating-point number
floatsingle-precision 32-bit floating-point number
half_floathalf-precision 16-bit floating-point number
scaled_floatfloating-point number backed by a long and scaled by a fixed double scaling factor
unsigned_longunsigned 64-bit integer

Even if you’re a new developer, you should recognize many of these data types because they are fairly common in many programming languages. Which numerical data type you plan to use will depend on your use case. As an example, here is a small table that will give you an idea of the upper and lower limits of each integer type:

Numerical Data TypeLower LimitUpper Limit
long-9,223,372,036,854,775,8089,223,372,036,854,775,807
integer-2,147,483,6482,147,483.647
short-32,76832,767
byte-128127
unsigned_long01.844674407371E+19
(or 264-1)

However, one of those numerical data types might need a bit more explanation. Let’s talk about the scaled_float.

Let’s say we have a document that’s indexed with this data type. There are two important things to note with this data type:

  1. It will be stored as an integer.
  2. It will also have a scaling factor.

In this scenario, if the document’s index is 5.6 with a scaling factor of 10, it would be stored as 56. In other words, you’re going to divide it by 10 to get the document’s index. On the surface, it might seem counterintuitive to store numbers this way. However, it makes your searches much more efficient when you store the floats as integers rather than floating-point numbers.

Geospatial Data Types in Elasticsearch

TypeDescription
geo_pointlatitude and longitude points
geo_shapecomplex shapes (ex: polygon)
pointarbitrary cartesian points 
shapearbitrary cartesian geometries

When dealing with searches today, geospatial data can serve an integral role in how best your data serves your end users. One of the primary data types you’ll use in this scenario will be geo_point data. Here are some ways those latitude and longitude points can be useful:

  • Finding geopoints within a specific distance of a location
  • Aggregating documents by their distance or geographic grids
  • Weaving distance into a search’s relevance
  • Sorting documents by their distance

However, there might be situations when aggregating data by its shape rather than its geopoints makes more sense. A geo_shape data type might include rectangles, lines, polygons, or triangles, much like you’d find in geometry. Elasticsearch accomplishes this by turning those shapes into multidimensional points, allowing for an accurate spatial resolution of the shape.

The point data type allows you to index x, y pairs in a two-dimensional coordinate system. There are five ways Elasticsearch facilitates this indexing:

  1. It can be expressed as an object, having type and coordinates keys.
  2. It can be expressed as a point, formatted this way: “POINT(x y)”.
  3. It can be expressed as an object, having x and y keys.
  4. It can be expressed as an array, formatted this way: [ x, y ].
  5. It can be expressed as a string, formatted this way: “x, y”.

Similar to the point data type, Elasticsearch’s shape allows you to search with shapes (such as rectangles or polygons) using coordinates in a two-dimensional system. These shapes can be queried using shape queries, which aggregates documents that use shape index mapping.

Practical Applications of Elasticsearch Data Types

Now that you know the difference between the main Elasticsearch data types, how can you apply them to your actual projects? All of this theory is useless if you don’t have anything to apply it to. Let’s break down the best practical applications for each data type and instances where you might see them deployed.

1. Textual Data Types

Digital Publishing: Elasticsearch’s textual data types come into play in the world of digital publishing. Publishers use Elasticsearch to create full-text search capabilities on their websites. This allows users to quickly find articles, blog posts, or other pieces of content that are relevant to their search terms.

Customer Support: Textual data types also prove useful in customer support scenarios. Businesses can use Elasticsearch to index and search through a vast database of support tickets, enabling quicker resolution of customer issues.

2. Numerical Data Types

E-commerce: Elasticsearch’s numerical data types find applications in e-commerce platforms. These data types can be used for efficient sorting and filtering of products based on numerical attributes such as price, ratings, or quantities available.

Stock Market Analysis: The financial industry relies heavily on numerical data types in Elasticsearch. Stock market data can be indexed in Elasticsearch to perform real-time analytics, pattern detection, and forecasting.

3. Geospatial Data Types

Food Delivery Services: Geospatial data types play an important role in food delivery and ride-hailing services. They use Elasticsearch to quickly match customers with the closest drivers or restaurants within a specified radius.

Real Estate Platforms: Real estate platforms also benefit from Elasticsearch’s geospatial data types. Indexing properties with their geolocation allows users of the platform to search for properties within a certain geographical range or in a specific neighborhood.

best paying tech jobs in Massachusetts
The beauty of Elasticsearch lies in its ability to work with a multitude of data types – textual, numerical, and geospatial, to name a few.

©Gorodenkoff/Shutterstock.com

Final Thoughts

This guide only scratches the surface of Elasticsearch’s capabilities. While it may seem complex at first, once you get the hang of it, Elasticsearch’s usefulness really becomes apparent. Let’s jump into some frequently asked questions.

Elasticsearch Data Types FAQs (Frequently Asked Questions) 

On what platforms can I use Elasticsearch?

You can use Elasticsearch on Amazon Web Services (AWS), Google Cloud Platform, or Microsoft Azure.

What is Elasticsearch best used for?

Elasticsearch excels at allowing you to store your unstructured data in a structured index, making it much more efficient to search through and analyze large amounts of data you need to do business. Some of the most common use cases are data analytics, autocompletion, document processing, and all types of searches.

What’s the difference between structured and unstructured data?

Unstructured data lives in documents and websites and doesn’t follow any strict rules for how this data is stored. However, structured data has limits (such as type of data, length, etc.) and is often found in relational databases and spreadsheets.

Is Elasticsearch and Amazon’s Elasticsearch (now called OpenSearch) the same thing?

Amazon’s OpenSearch service used to be known as Amazon’s Elasticsearch. It was originally based on an older version of Elasticsearch that didn’t offer as many features as Elasticsearch does. The only official Elasticsearch service is through Elasticsearch itself, which can be deployed on AWS and other cloud platforms.

What is the difference between text and match_only_text?

These two data types are similar, but they have some distinctions. The biggest difference between these two data types is that match_only_text disables scoring to make some queries faster and more efficient.

What is a search analyzer?

A search analyzer converts data into structured tokens that allows an end user to search easily for data. It ensures that data is presented in the same format as the terms defined in Elasticsearch’s inverted index. When using the text data type, you can customize the analyzer, while using match_only_text as your data type only allows you to use the default analyzer.

To top