Quantcast
Channel: PHP – Articles
Viewing all articles
Browse latest Browse all 22

PHP and SQLite FTS4: How to process the matchinfo function part II

0
0

Understanding the output from the matchinfo() function

Part I: Reading the binary output from the matchinfo() function

To analyze what the output of the matchinfo() function means, we’ll use a different example dataset than the one from the SQLite FTS4 documentation. Since I’m a bird photographer, let’s create a virtual FTS4 table holding information about photos of birds. It consists of the five columns id, title, description, species and common name (note: id is the image id and not the FTS4 rowid). Then we fill the table with four records making up our example data:

// create the example database
try {
  $db = new PDO('sqlite:example.sqlite');
} catch (PDOException $error) {
  echo $error->getMessage();
}

// create a virtual fts4 table and populate it with example data
try {
  $db->exec("CREATE VIRTUAL TABLE images USING fts4(imgId, title, description, species, speciesEn);
    INSERT INTO images VALUES(1, 'Great Spotted Woodpecker', 'A great spotted woodpecker with a caterpillar in its beak.', 'Dendrocopos major', 'Great Spotted woodpecker');
    INSERT INTO images VALUES(2, 'Woodpecker at the pond', 'A green woodpecker drinks water.', 'Picus viridis', 'Green Woodpecker');
    INSERT INTO images VALUES(3, 'Woodpecker', 'A middle spotted woodpecker is looking for food on an oak tree.', 'Dendrocopos medius', 'Middle Spotted Woodpecker');
    INSERT INTO images VALUES(4, 'Woodpecker', 'A lesser yellownape showing its green wings.', 'Picus chlorolophus', 'Lesser Yellownape');");
} catch (PDOException $error) {
  echo $error->getMessage().'<br>';
}

idNext we perform a FTS4 SELECT query to search for green woodpecker using the matchinfo() function. We also use a helper function to format the output visually in groups of three integers (for the sake of this example). Note that the query uses an implicit AND operator. If explicit AND queries are supported (e.g. green AND woodpecker) depends on whether enhanced query syntax is available or not. Wrap words in double quotes to search for the whole phrase, e.g “green woodpecker”.

// use matchinfo when searching for green woodpecker using an implicit AND operator
$data = $db->query("SELECT imgId, MATCHINFO(images) info FROM images WHERE images MATCH 'green woodpecker'");

// convert the binary output to integers and format integers in groups of three
while ($row = $data->fetch(PDO::FETCH_ASSOC)) {
  // matchinfo returns a blob of 32-bit unsigned integers in machine byte-order
  $arrInt32 = unpack('L*', $row['info']);
  echo 'row '.$row['imgId'].': '.formatOutput($arrInt32).'<br>';
}

/**
 * Format the array of integers with spaces between according to the example in the SQLite FTS documentation
 * @param array $arrInt32
 */
function formatOutput($arrInt32)
{
  $str = '' ;
  foreach ($arrInt32 as $i => $int) {
    $str.= ($i % 3 === 0 ? ' ' : '').$int;
  }
  
  return $str;
}

Our FTS4 search query will match records with id 2 and id 4. These are the only two rows of our table, where both words green and woodpecker occur. The converted and formatted output of the matchinfo function looks like the following:

row 2: 25 000 000 232 000 111 000 144 133 000 133
row 4: 25 000 000 132 000 011 000 144 033 000 033

The first two integers are simply the total number of search words of the query (green + woodpecker = 2) and the total number of columns in the table (id + title + description + species + speciesEn = 5).

Next, follow ten groups of three integers each. Each group represents a combination of the search words with a column (e.g. 2 * 5 = 10).

The first integer in the group gives the number of times the word (or phrase) matched within the content of the column of the current row. The second gives the number of times the word (or phrase) matched in the column of all rows in the table. The last number gives the number of rows where the word (or phrase) matched at least once.

For example, let’s look at the third group of integers from the output of matchinfo for row 2, e.g. 232. It represents the combination of the word green by the table column description:

FTS4
green woodpecker
id title description species speciesEn id title description species speciesEn
000 000 232 000 111 000 144 133 000 133
000 000 132 000 011 000144 033 000 033
Matrix of the search words green woodpecker by table columns as returned by the matchinfo function. See text to understand the meaning of the color coding

The word green matched in the column description twice within the content of the current row, a total of three times in all rows and it matched two rows overall:

idtitledescriptionspeciesspeciesEn
1Great Spotted WoodpeckerA great spotted woodpecker drinks water.Dendrocopos majorGreat Spotted Woodpecker
2A Woodpecker at the pondA green woodpecker with a green caterpillar in its beak.Picus viridisGreen Woodpecker
3WoodpeckerA middle spotted woodpecker is looking for food on an oak tree.Dendrocopos mediusMiddle Spotted Woodpecker
4WoodpeckerA lesser yellownape showing its green wings.Picus chlorolophusLesser Yellownape
Tabular representation of our example data in the virtual table images containing four records about photos with birds. The color coding shows the different number of times the word green matched within the column description of the search query green woodpecker.

To complete the second part of this series, we will interpret another integer triplet: integers 144 of the seventh group representing woodpecker by title of either the first or second result of the search query. The word woodpecker matched in the column title once within the content of the current row, four times in the content of all rows and it matched within four rows overall.


Viewing all articles
Browse latest Browse all 22

Latest Images

Trending Articles





Latest Images