ClickHouse OSS quick start

In this quick start tutorial, we'll get you set up with OSS ClickHouse in a few easy steps. You'll download the ClickHouse binary, start ClickHouse, create a table, insert data into it, and run a SELECT query.

Install and start ClickHouse

Download the ClickHouse binary and start an interactive session:

curl https://clickhouse.com/ | sh
./clickhouse

You should see a smiling face as ClickHouse starts up:

local-host :)

You're now in an interactive clickhouse-local session, where you can run SQL commands.

That's all it takes to get started. The details below explain what just happened and where to go next.

The first command downloads a single clickhouse binary into the current directory (ClickHouse runs natively on Linux and macOS, and on Windows via the WSL). The second starts clickhouse-local, which needs no configuration. The same binary can also run clickhouse-server and clickhouse-client.

For Mac users

If you're getting errors that the developer of the binary can't be verified, please see here.

You also installed clickhousectl

The same install script also installs clickhousectl, the ClickHouse CLI, into ~/.local/bin (with a chctl alias). clickhousectl helps you install and manage multiple local ClickHouse versions, run servers in the background, and manage ClickHouse Cloud. See the clickhousectl install page for the CLI-based workflow.

To download just the clickhouse binary without clickhousectl, set CLICKHOUSE_ONLY=1:

curl https://clickhouse.com/ | CLICKHOUSE_ONLY=1 sh

Note

clickhouse-local stores table data in a temporary location, so tables you create are no longer available after you restart it. To persist data, run a ClickHouse server instead — see the Quick install page for the clickhouse-server and clickhouse-client workflow. Note that neither is the recommended way to install ClickHouse for production; for that, see the install page.

Create a table

Use CREATE TABLE to define a new table. Typical SQL DDL commands work in ClickHouse with one addition - tables in ClickHouse require an ENGINE clause. Use MergeTree to take advantage of the performance benefits of ClickHouse:

CREATE TABLE my_first_table
(
    user_id UInt32,
    message String,
    timestamp DateTime,
    metric Float32
)
ENGINE = MergeTree
PRIMARY KEY (user_id, timestamp)

Insert data

You can use the familiar INSERT INTO TABLE command with ClickHouse, but it is important to understand that each insert into a MergeTree table causes what we call a part in ClickHouse to be created in storage. These parts later get merged in the background by ClickHouse.

In ClickHouse, we try to bulk insert lots of rows at a time (tens of thousands or even millions at once) to minimize the number of parts that need to get merged in the background process.

In this guide, we won't worry about that just yet. Run the following command to insert a few rows of data into your table:

INSERT INTO my_first_table (user_id, message, timestamp, metric) VALUES
    (101, 'Hello, ClickHouse!',                                 now(),       -1.0    ),
    (102, 'Insert a lot of rows per batch',                     yesterday(), 1.41421 ),
    (102, 'Sort your data based on your commonly-used queries', today(),     2.718   ),
    (101, 'Granules are the smallest chunks of data read',      now() + 5,   3.14159 )

Query your new table

You can write a SELECT query just like you would with any SQL database:

SELECT *
FROM my_first_table
ORDER BY timestamp

Notice the response comes back in a nice table format:

┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│     102 │ Insert a lot of rows per batch                     │ 2022-03-21 00:00:00 │ 1.41421 │
│     102 │ Sort your data based on your commonly-used queries │ 2022-03-22 00:00:00 │   2.718 │
│     101 │ Hello, ClickHouse!                                 │ 2022-03-22 14:04:09 │      -1 │
│     101 │ Granules are the smallest chunks of data read      │ 2022-03-22 14:04:14 │ 3.14159 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘

4 rows in set. Elapsed: 0.008 sec.

Insert your own data

The next step is to get your own data into ClickHouse. We have lots of table functions and integrations for ingesting data. We have some examples in the tabs below, or you can check out our Integrations page for a long list of technologies that integrate with ClickHouse.

Use the s3 table function to read files from S3. It's a table function - meaning that the result is a table that can be:

used as the source of a SELECT query (allowing you to run ad-hoc queries and leave your data in S3), or...
insert the resulting table into a MergeTree table (when you're ready to move your data into ClickHouse)

An ad-hoc query looks like:

SELECT
passenger_count,
avg(toFloat32(total_amount))
FROM s3(
'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_0.gz',
'TabSeparatedWithNames'
)
GROUP BY passenger_count
ORDER BY passenger_count;

Moving the data into a ClickHouse table looks like the following, where nyc_taxi is a MergeTree table:

INSERT INTO nyc_taxi
SELECT * FROM s3(
'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_0.gz',
'TabSeparatedWithNames'
)
SETTINGS input_format_allow_errors_num=25000;

View our collection of AWS S3 documentation pages for lots more details and examples of using S3 with ClickHouse.

The s3 table function used for reading data in AWS S3 also works on files in Google Cloud Storage.

For example:

SELECT
*
FROM s3(
'https://storage.googleapis.com/my-bucket/trips.parquet',
'MY_GCS_HMAC_KEY',
'MY_GCS_HMAC_SECRET_KEY',
'Parquet'
)
LIMIT 1000

Find more details on the s3 table function page.

The url table function reads files accessible from the web:

--By default, ClickHouse prevents redirects to protect from SSRF attacks.
--The URL below requires a redirect, so we must set max_http_get_redirects > 0.
SET max_http_get_redirects=10;

SELECT *
FROM url(
'http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv',
'CSV'
);

Find more details on the url table function page.

Use the file table engine to read a local file. For simplicity, copy the file to the user_files directory (which is found in the directory where you downloaded the ClickHouse binary).

DESCRIBE TABLE file('comments.tsv')

Query id: 8ca9b2f9-65a2-4982-954a-890de710a336

┌─name──────┬─type────────────────────┐
│ id        │ Nullable(Int64)         │
│ type      │ Nullable(String)        │
│ author    │ Nullable(String)        │
│ timestamp │ Nullable(DateTime64(9)) │
│ comment   │ Nullable(String)        │
│ children  │ Array(Nullable(Int64))  │
└───────────┴─────────────────────────┘

Notice ClickHouse infers the names and data types of your columns by analyzing a large batch of rows. If ClickHouse can not determine the file format from the filename, you can specify it as the second argument:

SELECT count()
FROM file(
'comments.tsv',
'TabSeparatedWithNames'
)

View the file table function docs page for more details.

Use the postgresql table function to read data from a table in PostgreSQL:

SELECT *
FROM
postgresql(
'localhost:5432',
'my_database',
'my_table',
'postgresql_user',
'password')
;

View the postgresql table function docs page for more details.

Use the mysql table function to read data from a table in MySQL:

SELECT *
FROM
mysql(
'localhost:3306',
'my_database',
'my_table',
'mysql_user',
'password')
;

View the mysql table function docs page for more details.

ClickHouse can read data from any ODBC or JDBC data source:

SELECT *
FROM
odbc(
'DSN=mysqlconn',
'my_database',
'my_table'
);

View the odbc table function and the jdbc table function docs pages for more details.

Message queues can stream data into ClickHouse using the corresponding table engine, including:

Kafka: integrate with Kafka using the Kafka table engine
Amazon MSK: integrate with Amazon Managed Streaming for Apache Kafka (MSK)
RabbitMQ: integrate with RabbitMQ using the RabbitMQ table engine

ClickHouse has table functions to read data from the following sources:

Hadoop: integrate with Apache Hadoop using the hdfs table function
Hudi: read from existing Apache Hudi tables in S3 using the hudi table function
Iceberg: read from existing Apache Iceberg tables in S3 using the iceberg table function
DeltaLake: read from existing Delta Lake tables in S3 using the deltaLake table function

Explore

Check out our Core Concepts section to learn some of the fundamentals of how ClickHouse works under the hood.
Check out the Advanced Tutorial which takes a much deeper dive into the key concepts and capabilities of ClickHouse.
Continue your learning by taking our free on-demand training courses at the ClickHouse Academy.
We have a list of example datasets with instructions on how to insert them.
If your data is coming from an external source, view our collection of integration guides for connecting to message queues, databases, pipelines and more.
If you're using a UI/BI visualization tool, view the user guides for connecting a UI to ClickHouse.
The user guide on primary keys is everything you need to know about primary keys and how to define them.

Install and start ClickHouse​

Create a table​

Insert data​

Query your new table​

Insert your own data​

Explore​