The document describes the discovery process used by Atrium Discovery to scan devices on a network. It involves the following key steps:
1. Scanning IPs to determine accessibility and detect open ports. Credentials are used to try accessing devices.
2. Classifying devices and collecting additional information if a host is detected. Cached credentials are used for faster future access.
3. Optimization is done to avoid rescanning the same hosts multiple times and minimize network load.
4. Discovery is restricted for sensitive devices and full discovery only occurs if required information can be retrieved from hosts.
The next four slides of animation show the basic approach to discovery alongside the nodes that are built in the model. The emphasis is that everything we do is recorded
On the first scan the only likely cause is “Excluded”. Very rarely you can get “OptAlreadyProcessing” if the same endpoint is injected while one is still in the queue - see later slide.
Pinging before scan allows us to optimise our detection of real device that will respond to discovery as opposed to dark space. Advanced Use “ Ping hosts before scanning” can be disabled globally for environments that suppress ICMP, but at the expense of slower performance in dark space. Consider the use of TCP ACK or TCP SYN ping to replace the standard ICMP ping if environment allows (“Use TCP ACK ping before scanning”, “Use TCP SYN ping before scanning”) or use “Exclude ranges from ping” if only a small area of the environment is an issue (maybe a DMZ)
If the endpoint responds to ping then discovery goes on to look for open ports. If the estate is hardened, discovery can have difficulty detecting open ports. In these situations consider modifying the discovery configuration setting “Valid Port States”. Contact support for advice before making modifications
It’s important that the appliance can see these ports open (or regarded as valid if you read the notes on the ports slide), otherwise discovery will not proceed. This list of ports has been aggressively honed from experience to focus only on regular stable service ports that are minimum risk whilst still allowing for effective discovery. Attempting to use fewer ports will reduce the quality and stability of discovery.
Depending on Dark Space settings we may or may not retain DiscoveryAccess nodes marked as NoResponse
UNIX methods will only be tried if the appliance can detect an open UNIX port (22 SSH, 23 telnet, 513 rlogin) at the end point and there is a credential for that endpoint *and* port in the vault.
If the slaves are restricted then only the ones valid for that endpoint. It is a common source of confusion, but vital to understand, that the slave is only a proxy and not a distributed discovery agent. If the appliance cannot detect that port 135 (Windows RPC) is open on the endpoint then discovery will not attempt to use ay windows slave. This can often be an issue with clients deploying Windows Slaves in protected areas of the network in the assumption this will allow scanning, it will not, and in this situation using multiple appliances and consolidation is the correct deployment. Advanced Use If there is no option but to have the appliance in a situation where it cannot detect port 135 on the endpoint then “Check port 135 before using Windows access methods” can be set to “no”. In this situation the appliance will direct all discovery requests that do not respond to a UNIX method via all registered slaves in sequence, this will cause discovery to take significantly longer per endpoint and noticeably degrade performance.
The SNMP discovery methods are more limited and should be regarded as fallback methods as they provide only basic information. No access to files or running of commands will be possible. The SNMP port is 161(UDP) OS currently supported in this fashion are IBM I (formerly OS/400), Netware, OpenVMS, z/OS (formerly OS/390). Netware is only available via SNMP v1
If the access methods have failed so far, then discovery will attempt the following methods to try to identify the device. If the device has a SNMP port 161 open, discovery will try to recover basic system information with a public community string. IP Stack Fingerprinting exploits the fact that is a close relation between an IP Stack and an OS, as each OS normally has a dedicated IP Stack; it is often possible to determine the OS quite accurately. But for IP Stack Fingerprinting to work well it needs to investigate closed as well as open ports. We use port 4 for the closed port. For the open ports we only use the ports used for our access methods. If the device has the telnet port 23 open than frequently the banner is presented before the login prompt and this will provide information about the device and its OS. Similarly a simple HTTP GET is used if port 80 is open. The results will often contain information about the device and its OS. All these methods are required for credentialess scanning. Disabling or modifying them is not recommended as without them identifying Hosts that need credentials to be deployed is very inefficient. Advanced Use IP Fingerprinting can be turned off with the “ Use IP Fingerprinting to Identify OS” option set to “no”, or the list of ports used for fingerprinting can be altered. Neither are recommended. Telnet banner sampling can be turned off using the “ Use Telnet Banner to Identify OS” option. SNMP SysDescr can be turned off by using the “ Use SNMP SysDescr to Identify OS” option. HTTP HEAD can be turned off by using the “ Use HTTP HEAD Request to Identify OS” option Contact support before attempt to change these settings.
At this stage we have already got a successful getDeviceInfo as we have an active session. In later modules we will refer back to the fact that these three methods need to succeed in order to creat/update a Host node.
Without success in completing DeviceInfo, HostInfo and InterfaceList we do not have enough information to feed the Host Identification algorithm. The system *can* cope with partial results in those methods, although the identity of the Host will be less stable the less properties it has to work on. Common reasons to not complete: Credential permissions Poor edits to scripts with uncaught stderr or other script termination issues. Login Timeout – check for timeout Script Timeout – check for timeout ScriptFailure related to the method. Increase the credential timeout to 180 seconds Parse failure (or incomplete DeviceInfo) – check for parsefailure ScriptFailure related to the method, check for scrambled session output. Turn on session logging and check for out of sequence characters. Consider increasing Session Line Delay,
The Host Algorithm uses a weighting technique to try and compute a key. The weighting compares the current properties with those from existing candidate Host nodes. If there is a difference and it is significant a new Host.key is generated, otherwise it uses the closest match. This allows a certain amount of change (such as upgrading an OS or changing a NIC) without forcing a new identity. We cannot compare every existing Host so we pre select candidates. These include the Host that this endpoint was associated with last time, Hosts with interfaces on the same IP as the current endpoint as well as Hosts that have the same serial number as that of the current properties.
end_state only relates to establishing a good quality session to the endpoint and relating it to an existing node.
On the first scan the only likely cause is “Excluded”. Very rarely you can get “OptAlreadyProcessing” if the same endpoint is injected while one is still in the queue - see later slide. Later on we can get OptNotBestIP and OptRemote from the optimization systems – these are described next OptNotBestIP – we know this endpoint was optimized last time so we assume it will be this time and do not contact it OptRemote – only seen on a Consolidation Appliance. Means that the endpoint was optimised on the Scanning Appliance. Full details of state will be on the Scanning Appliance.
Why do we still go to the OS/Device classifier, rather than further down? Because if we are using widely deployed credentials they may well work on another Host, and we still have to check if it is the same Host and not another one that the credentials works on that has moved to this IP since we last scanned.
Under some conditions the same IP can be requested while another scan of the same IP is still in progress. To prevent collisions if a duplicate is detected then one of the endpoints is skipped.
In general the level of access to the OSI over each interface is the same. There is no point scanning over the same endpoint several times in a range (or indeed across ranges) so we should only scan over one of the interfaces
Essentially the first endpoint that provides the GoodAccess end_state is the one we will attempt to use. Note hat as we have recovered up to date DeviceInfo, HostInfo and InterfaceList these properties of the Host node are updated. This is fine detail and probably only confuses the issue in an overview, but is included in the notes for completeness. Sometimes we will talk about the “BestIP”, this is an internal name for the system that picks the highest quality endpoint and is sometimes used to refer to the endpoint that is picked.
By default we will do this every 7 days. Advanced Use The setting is controlled by the value of “Scan optimization timeout” and this is a Model Maintenance setting rather than a discovery one. We don’t advise changing this value without advice.
It’s highly unlikely that you will get an early error state as that suggests a fundamental error in the core system and these are picked up in internal testing if they occur. More likely is an error from amended discovery scripts.
Pattern success or failure does not alter the summary states that track session establishment. These have their own tracking methods that will be described later. Note also that not all the standard discovery scripts may have completed successfully. Again further tracking methods will be described later to allow any issues to be understood.
This is subtle change but a Sweep Scan scan level never intends to get beyond a DeviceIndentified vs NoRepsonse state as it is intended for surveys of the estate during roll out and sizing of the project. Other scan levels are not included in the chart as these are the two that should be used during normal use; other scan levels should be used under guidance.
You may wish to download the state charts that were used during this module. Please download the chart zip file that should be available where you accessed this module.