diff --git a/docs/integrations/language-clients/java/client-v2.md b/docs/integrations/language-clients/java/client-v2.md index 8ec3b9935da..8f6fe93b6d7 100644 --- a/docs/integrations/language-clients/java/client-v2.md +++ b/docs/integrations/language-clients/java/client-v2.md @@ -111,6 +111,25 @@ Please use tools like [openssl](https://docs.openssl.org/master/man1/openssl/) t ::: +## Writing Data +This sections describes common scenarios of writing data to ClickHouse. Client has different API methods for different use cases: +- `insert(String tableName, InputStream data, ClickHouseFormat format, InsertSettings settings)` - should be used to write data in a text format. Input stream defined by `data` will compressed according to the settings. Data encoding is done by an application. +- `insert(String tableName, List data, InsertSettings settings)` - should be used to write a list of POJOs (or DTOs). Client will encode data as RowBinary and will handle serialization according to a tabele schema of the table "tableName". Stream will be compressed according to the settings. Can be used for a big datasets. +- `insert(String tableName, DataStreamWriter writer, ClickHouseFormat format, InsertSettings settings)` - more advanced version of the first API method. This one accepts a functional interface implementation that can control how data is written to server. This method is useful when transcoding data into a byte stream not wanted or to organize reading data from a queue. This methods allows to use application compression when data is already compresses, for example, as LZ4 frames. However there are some limitations. + +Speed of write operation, first of all, defined by how fast server processes data. It can be slow if data requires a lot of parsing. Therefore operation performance affected much by data format. Please read our blog [post](https://clickhouse.com/blog/clickhouse-input-format-matchup-which-is-fastest-most-efficient) about input formats. Client uses RowBinary by default but user may choose more suitable format for her data. +Configuration is the next stop where performance can be improved. Client builder method `setClientNetworkBufferSize(int size)` should be used to configure size of a buffer that stands between socket and client. This is important configuration because defines how many socket IO operation will be done to send data. When size of this buffer is too small it causes many calls to OS what is potential bottle-neck. When size of this buffer is too big compare to socket buffers, system performance and available memory it will cause slowness because copying data from heap to OS buffer is very expensive operations. Big size of the buffer means more memory needed per request, too. + + +## Reading Data +This sections describes common scenarios of reading data from ClickHouse. Client has different API method for different use cases: +- `query(String sqlQuery, QuerySettings settings)` and `query(String sqlQuery, Map queryParams, QuerySettings settings)` - base methods for most query requests. This pair of method accept raw SQL query and return response object with raw input stream of bytes. This method should be used for big queries because access to raw byte stream allows to use most performant way of reading data. Main benefits of these methods - they allows to streamline reading of data and avoid allocating a lot of memory. +- `queryAll(String sqlQuery, QuerySettings settings)` and `queryAll(String sqlQuery, Map params, QuerySettings settings)` - methods are designed to simplify fetching small amount of data in way of iterable collection or records. These methods should be used only to fetch small number of rows. These methods read result from a server fully. It may because significant peak of memory usage especially in high concurrent applications. +- `queryAll(String sqlQuery, Class clazz, TableSchema schema, Supplier allocator)` is for reading result set directly to a plain java objects (DTOs). Method is suitable for any size of a result. Method uses precompiled serializers to minimize operations overhead. Method doesn't hold connection after reading. + +Data can be fetched in any output format that ClickHouse support. Client will try to use RowBinaryWithNamesAndTypes by default because this format has metadata definition at the header lines and it more compact for network transfer. +As we mentioned in "Writing Data" section, there is client builder method `setClientNetworkBufferSize(int size)` that works for read in the same ways as for writes. + ## Configuration {#configuration} All settings are defined by instance methods (a.k.a configuration methods) that make the scope and context of each value clear. @@ -352,6 +371,128 @@ try (InsertResponse response = client.insert(TABLE_NAME, events).get()) { } ``` +### insert(String tableName, DataStreamWriter writer, ClickHouseFormat format, InsertSettings settings) +**Beta** + +This API method allows to pass a writer object that will encode data directly into an output stream. Data will be compressed by the client. +There is a configuration option in `InsertSettings` called `appCompressedData` that allows to turn off client compression and let application to send compressed stream. +Examples shows major usecases this API was designed for. + +`com.clickhouse.client.api.DataStreamWriter` is a functional interface with a method `onOutput` that is called by the client when output stream is ready for data to be written. This interface has +another method `onRetry` with default implementation. This method is called when retry logic is triggered and mainly used to reset data source if applicable. + + +**Signatures** +```java +CompletableFuture insert(String tableName, // name of destination table + DataStreamWriter writer, // data writer instance + ClickHouseFormat format, // data format in which the writer encodes data + InsertSettings settings) // operation settings +``` + +**Parameters** + +`tableName` - name of the target table. + +`writer` - data writer instance. + +`format` - data format in which the writer encodes data. + +`settings` - request settings. + +**Return value** + +Future of `InsertResponse` type - the result of the operation and additional information like server side metrics. + +**Examples** + +Writing a collection of JSON objects encoded as string values using `JSONEachRow` format: +```java showLineNumbers + +final int EXECUTE_CMD_TIMEOUT = 10; // seconds +final String tableName = "events"; +final String tableCreate = "CREATE TABLE \"" + tableName + "\" " + + " (name String, " + + " v1 Float32, " + + " v2 Float32, " + + " attrs Nullable(String), " + + " corrected_time DateTime('UTC') DEFAULT now()," + + " special_attr Nullable(Int8) DEFAULT -1)" + + " Engine = MergeTree ORDER by ()"; + +client.execute("DROP TABLE IF EXISTS " + tableName).get(EXECUTE_CMD_TIMEOUT, TimeUnit.SECONDS); +client.execute(createTableSQL).get(EXECUTE_CMD_TIMEOUT, TimeUnit.SECONDS); + +String correctedTime = Instant.now().atZone(ZoneId.of("UTC")).format(DataTypeUtils.DATETIME_FORMATTER); +String[] rows = new String[] { + "{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\", \"corrected_time\": \"" + correctedTime + "\", \"special_attr\": 10}", + "{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\", \"corrected_time\": \"" + correctedTime + "\"}", + "{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\" }", + "{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6 }", +}; + + +try (InsertResponse response = client.insert(tableName, out -> { + // writing raw bytes + for (String row : rows) { + out.write(row.getBytes()); + } + +}, ClickHouseFormat.JSONEachRow, new InsertSettings()).get()) { + + System.out.println("Rows written: " + response.getWrittenRows()); +} + +``` + +Writing already compressed data: +```java showLineNumbers +String tableName = "very_long_table_name_with_uuid_" + UUID.randomUUID().toString().replace('-', '_'); +String tableCreate = "CREATE TABLE \"" + tableName + "\" " + + " (name String, " + + " v1 Float32, " + + " v2 Float32, " + + " attrs Nullable(String), " + + " corrected_time DateTime('UTC') DEFAULT now()," + + " special_attr Nullable(Int8) DEFAULT -1)" + + " Engine = MergeTree ORDER by ()"; + +client.execute("DROP TABLE IF EXISTS " + tableName).get(EXECUTE_CMD_TIMEOUT, TimeUnit.SECONDS); +client.execute(createTableSQL).get(EXECUTE_CMD_TIMEOUT, TimeUnit.SECONDS); + +String correctedTime = Instant.now().atZone(ZoneId.of("UTC")).format(DataTypeUtils.DATETIME_FORMATTER); +String[] data = new String[] { + "{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\", \"corrected_time\": \"" + correctedTime + "\", \"special_attr\": 10}", + "{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\", \"corrected_time\": \"" + correctedTime + "\"}", + "{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\" }", + "{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6 }", +}; + + +// This step is only for showcase. Real application would have already compressed data. +byte[][] compressedData = new byte[data.length][]; +for (int i = 0 ; i < data.length; i++) { + ByteArrayOutputStream baos = new ByteArrayOutputStream(); + GZIPOutputStream gz = new GZIPOutputStream(baos); + gz.write(data[i].getBytes(StandardCharsets.UTF_8)); + gz.finish(); + compressedData[i] = baos.toByteArray(); +} + +InsertSettings insertSettings = new InsertSettings() + .appCompressedData(true, "gzip"); // defining compression algorithm (sent via HTTP headers) + +try (InsertResponse response = client.insert(tableName, out -> { + // Writing data + for (byte[] row : compressedData) { + out.write(row); + } +}, ClickHouseFormat.JSONEachRow, insertSettings).get()) { + System.out.println("Rows written: " + response.getWrittenRows()); +} + +``` + ### InsertSettings {#insertsettings} Configuration options for insert operations.